Latest AI News & Updates

#programming #data science #editors pick #python #python for beginners #randomization

Let's generate randomness in our code’s outputs
The post How to Implement Randomization with the Python Random Module appeared first on Towards Data Science.

#advanced (300) #amazon sagemaker hyperpod #technical how-to

Amazon SageMaker HyperPod clusters with Amazon Elastic Kubernetes Service (EKS) orchestration now support creating and managing interactive development environments such as JupyterLab and open source Visual Studio Code, streamlining the ML development lifecycle by providing managed environments for familiar tools to data scientists. This post shows how HyperPod administrators can configure Spaces for their clusters, and how data scientists can create and connect to these Spaces.

#data science #career advice #math #data science careers #data science projects

Avoid these mistakes to fast track your data science career.
The post Data Science Mistakes That Could Ruin Your Learning Path, and How to Avoid Them appeared first on Towards Data Science.

#llm applications #artificial intelligence #json #large language models #programming #python

A developer’s guide to perfect JSON and typed outputs from Claude Sonnet 4.5 and Opus 4.1
The post A Hands-On Guide to Anthropic’s New Structured Output Capabilities appeared first on Towards Data Science.

#large language models #artificial intelligence #editors pick #llm evaluation #natural language processing #python

A step-by-step guide to building AI quality control using large language models
The post LLM-as-a-Judge: What It Is, Why It Works, and How to Use It to Evaluate AI Models appeared first on Towards Data Science.

#amazon bedrock #announcements #artificial intelligence

Anthropic's newest foundation model, Claude Opus 4.5, is now available in Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models from leading AI companies. In this post, I'll show you what makes this model different, walk through key business applications, and demonstrate how to use Opus 4.5's new tool use capabilities on Amazon Bedrock.

Abacus AI offers the world’s first professional and enterprise AI Super Assistant. It’s an all-in-one AI platform for the top language, image, voic,e and video models along with all the tooling and infrastructure to support them. Abacus can connect to all YOUR data and apply AI to automate work.

#amazon bedrock #amazon machine learning #artificial intelligence

In this post, we show how to deploy the GPT-OSS-20B model on Amazon Bedrock using Custom Model Import while maintaining complete API compatibility with your current applications.

Why write 10 lines of matplotlib code when Lux can show you what you need in one click?

#research #sustainable computing #energy #chemical engineering #mechanical engineering #mit energy initiative #renewable energy #energy storage #energy efficiency #electric grid #sustainability #infrastructure #cleaner industry #electric vehicles #artificial intelligence #machine learning #climate #school of engineering #weather modeling

AI supports the clean energy transition as it manages power grid operations, helps plan infrastructure investments, guides development of novel materials, and more.

#artificial intelligence #app #the state of ai #why it matters

Welcome back to The State of AI, a new collaboration between the Financial Times and MIT Technology Review. Every Monday, writers from both publications debate one aspect of the generative AI revolution reshaping global power. In this week’s conversation MIT Technology Review’s senior reporter for features and investigations, Eileen Guo, and FT tech correspondent Melissa…

#artificial intelligence #app #summary #what's next in tech #why it matters

In 2017, fresh off a PhD on theoretical chemistry, John Jumper heard rumors that Google DeepMind had moved on from building AI that played games with superhuman skill and was starting up a secret project to predict the structures of proteins. He applied for a job. Just three years later, Jumper celebrated a stunning win…

Make.com enables data professionals to automate tedious tasks, such as data collection and reporting, without coding, saving hours weekly and enhancing accuracy.

#security #security / cyberattacks and hacks #security / security news #business / artificial intelligence

Born out of an internal hackathon, Amazon’s Autonomous Threat Analysis system uses a variety of specialized AI agents to detect weaknesses and propose fixes to the company’s platforms.

This article walks through five container setups that consistently help developers move from idea to experiment to deployment without fighting their own toolchains.

#ai & ml #commentary

The following article originally appeared on Medium and is being republished here with the author’s permission. Don’t get me wrong, I’m up all night using these tools. But I also sense we’re heading for an expensive hangover. The other day, a colleague told me about a new proposal to route a million documents a day […]

Machine learning models often behave differently across environments.

#business #business / artificial intelligence

The model policy team leads core parts of AI safety research, including how ChatGPT responds to users in crisis.

#business #business / artificial intelligence

Azalia King was the last holdout preventing the construction of a Micron "megafab." Onondaga County authorities threatened to use eminent domain to take her home away by force.

#business / regulation

The Trump administration’s pressure on European regulators is having an impact, with fewer restrictions on Big Tech and canceled measures.

This article is divided into four parts; they are: • Preparing Documents • Creating Sentence Pairs from Document • Masking Tokens • Saving the Training Data for Reuse Unlike decoder-only models, BERT's pretraining is more complex.

#ai #datadecisionmakers

Remember the first time you heard your company was going AI-first?Maybe it came through an all-hands that felt different from the others. The CEO said, “By Q3, every team should have integrated AI into their core workflows,” and the energy in the room (or on the Zoom) shifted. You saw a mix of excitement and anxiety ripple through the crowd.Maybe you were one of the curious ones. Maybe you’d already built a Python script that summarized customer feedback, saving your team three hours every week. Or maybe you’d stayed late one night just to see what would happen if you combined a dataset with a large language model (LLM) prompt. Maybe you’re one of those who’d already let curiosity lead you somewhere unexpected.But this announcement felt different because suddenly, what had been a quiet act of curiosity was now a line in a corporate OKR. Maybe you didn’t know it yet, but something fundamental had shifted in how innovation would happen inside your company.How innovation happensReal transformation rarely looks like the PowerPoint version, and almost never follows the org chart.Think about the last time something genuinely useful spread at work. It wasn't because of a vendor pitch or a strategic initiative, was it? More likely, someone stayed late one night, when no one was watching, found something that cut hours of busywork, and mentioned it at lunch the next day. “Hey, try this.” They shared it in a Slack thread and, in a week, half the team was using it.The developer who used GPT to debug code wasn’t trying to make a strategic impact. She just needed to get home earlier to her kids. The ops manager who automated his spreadsheet didn’t need permission. He just needed more sleep.This is the invisible architecture of progress — these informal networks where curiosity flows like water through concrete… finding every crack, every opening.But watch what happens when leadership notices. What used to be effortless and organic becomes mandated. And the thing that once worked because it was free suddenly stops being as effective the moment it’s measured.The great reversalIt usually begins quietly. Often when a competitor announces new AI features, — like AI-powered onboarding or end-to-end support automation — claiming 40% efficiency gains.The next morning, your CEO calls an emergency meeting. The room gets still. Someone clears their throat. And you can feel everyone doing mental math about their job security. “If they’re that far ahead, what does that mean for us?”That afternoon, your company has a new priority. Your CEO says, “We need an AI strategy. Yesterday.”Here's how that message usually ripples down the org chart:At the C-suite: “We need an AI strategy to stay competitive.”At the VP level: “Every team needs an AI initiative.”At the manager level: “We need a plan by Friday.”At your level: “I just need to find something that looks like AI.”Each translation adds pressure while subtracting understanding. Everyone still cares, but that translation changes intent. What begins as a question worth asking becomes a script everyone follows blindly.Eventually, the performance of innovation replaces the thing itself. There’s a strange pressure to look like you’re moving fast, even when you’re not sure where you’re actually going.This repeats across industriesA competitor declared they’re going AI-first. Another publishes a case study about replacing support with LLMs. And a third shares a graph showing productivity gains. Within days, boardrooms everywhere start echoing the same message: “We should be doing this. Everyone else already is, and we can’t fall behind.”So the work begins. Then come the task forces, the town halls, the strategy docs and the targets. Teams are asked to contribute initiatives.But if you’ve been through this before, you know there’s often a difference between what companies announce and what they actually do. Because press releases don’t mention the pilots that stall, or the teams that quietly revert to the old way, or even the tools that get used once and abandoned. You might know someone who was on one of those teams, or you might’ve even been on one yourself.These aren’t failures of technology or intent. ChatGPT works fine. And teams want to automate their tasks. These failures are organizational, and they happen when we try to imitate outcomes without understanding what created them in the first place.And so when everyone performs innovation, it becomes almost impossible to tell who’s actually doing it.Two kinds of leadersYou’ve probably seen both, and it’s very easy to tell which kind you’re working with.One spends an entire weekend prototyping. They try something new, fail at half of it, and still show up Monday saying, “I built this thing with Claude. It crashed after two hours, but I learned a lot. Wanna see? It's very basic, but it might solve that thing we talked about.”They try to build understanding. You can tell they’ve actually spent time with AI, and struggled with prompts and hallucinations. Instead of trying to sound certain, they talk about what broke, what almost worked and what they’re still figuring out. They invite you to try something new, because it feels like there’s room to learn. That’s what leading by participation looks like.The other sends you a directive in Slack: “Leadership wants every team using AI by the end of the quarter. Plans are due by Friday.” They enforce compliance with a decision that's already been made. You can even hear it in their language, and how certain they sound.The curious leader builds momentum. The performative one builds resentment.What actually worksYou probably don’t need someone to tell you where AI works. You already know because you’ve seen it.Customer support: LLMs genuinely help with Tier 1 tickets. They understand intent, draft simple responses and route complexity. Not perfectly, of course, — I’m sure you've seen the failures — but well enough to matter.Code assistance: At 2 a.m., when you’re half-delirious and your AI assistant suggests exactly what you need, it feels like having an over-caffeinated junior programmer who never judges your forgotten semicolons. You save minutes at first, then hours, then days.These small, cumulative wins compound over time. They aren't the impressive transformations promised in decks, but the kind of improvements you can rely on.But outside these zones, things get murky. AI-driven revops? Fully automated forecasting? You've sat through those demos, and you’ve also seen the enthusiasm fade once the pilot actually begins.Have the builders of these AI tools failed? Hardly. The technology is evolving, and the products built on top of it are still learning how to walk.So how can you tell if your company's AI adoption is real? Simple. Just ask someone in finance or ops. Ask what AI tools they use daily. You might get a slight pause or an apologetic smile. “Honestly? Just ChatGPT.” That’s it. Not the $50k enterprise-grade platform from last quarter’s demo or the expensive software suite in the board deck. Just a browser tab, same as any college student writing an essay.You might make this same confession yourself. Despite all the mandates and initiatives, your most powerful AI tool is probably the same one everyone else uses. So what does this tell us about the gap between what we're supposed to be doing and what we're actually doing?How to drive change at your companyYou've probably discovered this yourself, even if no one's ever put it into words:Model what you mean: Remember that engineering director who screen-shared her messy, live coding session with Cursor? You learned more from watching her debug in real time than from any polished presentation, because vulnerability travels farther than directives.Listen to the edges: You know who's actually using AI effectively in your organization, and they're not always the ones with “AI” in their title. They're the curious ones who've been quietly experimenting, finding what works through trial and error. And that knowledge is worth more than any analyst report.Create permission (not pressure): The people inclined to experiment will always find a way, and the rest won’t be moved by force. The best thing you can do is make the curious feel safe to stay curious.We're living in this strange moment, caught between the AI that vendors promise and the AI that actually exists on our screens, and it's deeply uncomfortable. The gap between product and promise is wide.But what I've learned from sitting in that discomfort is that companies that will thrive aren’t the ones that adopted AI first, but the ones that learned through trial and error. They stayed with the discomfort long enough for it to teach them something.Where will you be six months from now?By then, your company’s AI-first mandate will have set into motion departmental initiatives, vendor contracts and maybe even some new hires with “AI” in their titles. The dashboards will be green, and the board deck will have a whole slide on AI.But in the quiet spaces where your actual work happens, what will have meaningfully changed?Maybe you'll be like the teams that never stopped their quiet experiments. Your customer feedback system might catch the patterns humans miss. Your documentation might update itself. Chances are, if you were building before the mandate, you’ll be building after it fades.That’s invisible architecture of genuine progress: Patient, and completely uninterested in performance. It doesn't make for great LinkedIn posts, and it resists grand narratives. But it transforms companies in ways that truly last.Every organization is standing at the same crossroads right now: Look like you’re innovating, or create a culture that fosters real innovation.The pressure to perform innovation is real, and it’s growing. Most companies will give in and join the theater. But some understand that curiosity can’t be forced, and progress can’t be performed. Because real transformation happens when no one’s watching, in the hands of the people still experimenting, still learning. That’s where the future begins.Siqi Chen is co-founder and CEO of Runway. Read more from our guest writers. Or, consider submitting a post of your own! See our guidelines here.

Self-aware AI, $46M romance scam, Nvidia smuggling, Google fights back, and more...

#ai

Microsoft has introduced Fara-7B, a new 7-billion parameter model designed to act as a Computer Use Agent (CUA) capable of performing complex tasks directly on a user’s device. Fara-7B sets new state-of-the-art results for its size, providing a way to build AI agents that don’t rely on massive, cloud-dependent models and can run on compact systems with lower latency and enhanced privacy.While the model is an experimental release, its architecture addresses a primary barrier to enterprise adoption: data security. Because Fara-7B is small enough to run locally, it allows users to automate sensitive workflows, such as managing internal accounts or processing sensitive company data, without that information ever leaving the device. How Fara-7B sees the webFara-7B is designed to navigate user interfaces using the same tools a human does: a mouse and keyboard. The model operates by visually perceiving a web page through screenshots and predicting specific coordinates for actions like clicking, typing, and scrolling.Crucially, Fara-7B does not rely on "accessibility trees,” the underlying code structure that browsers use to describe web pages to screen readers. Instead, it relies solely on pixel-level visual data. This approach allows the agent to interact with websites even when the underlying code is obfuscated or complex.According to Yash Lara, Senior PM Lead at Microsoft Research, processing all visual input on-device creates true "pixel sovereignty," since screenshots and the reasoning needed for automation remain on the user’s device. "This approach helps organizations meet strict requirements in regulated sectors, including HIPAA and GLBA," he told VentureBeat in written comments.In benchmarking tests, this visual-first approach has yielded strong results. On WebVoyager, a standard benchmark for web agents, Fara-7B achieved a task success rate of 73.5%. This outperforms larger, more resource-intensive systems, including GPT-4o, when prompted to act as a computer use agent (65.1%) and the native UI-TARS-1.5-7B model (66.4%).Efficiency is another key differentiator. In comparative tests, Fara-7B completed tasks in approximately 16 steps on average, compared to roughly 41 steps for the UI-TARS-1.5-7B model.Handling risksThe transition to autonomous agents is not without risks, however. Microsoft notes that Fara-7B shares limitations common to other AI models, including potential hallucinations, mistakes in following complex instructions, and accuracy degradation on intricate tasks.To mitigate these risks, the model was trained to recognize "Critical Points." A Critical Point is defined as any situation requiring a user's personal data or consent before an irreversible action occurs, such as sending an email or completing a financial transaction. Upon reaching such a juncture, Fara-7B is designed to pause and explicitly request user approval before proceeding. Managing this interaction without frustrating the user is a key design challenge. "Balancing robust safeguards such as Critical Points with seamless user journeys is key," Lara said. "Having a UI, like Microsoft Research’s Magentic-UI, is vital for giving users opportunities to intervene when necessary, while also helping to avoid approval fatigue." Magentic-UI is a research prototype designed specifically to facilitate these human-agent interactions. Fara-7B is designed to run in Magentic-UI.Distilling complexity into a single modelThe development of Fara-7B highlights a growing trend in knowledge distillation, where the capabilities of a complex system are compressed into a smaller, more efficient model.Creating a CUA usually requires massive amounts of training data showing how to navigate the web. Collecting this data via human annotation is prohibitively expensive. To solve this, Microsoft used a synthetic data pipeline built on Magentic-One, a multi-agent framework. In this setup, an "Orchestrator" agent created plans and directed a "WebSurfer" agent to browse the web, generating 145,000 successful task trajectories.The researchers then "distilled" this complex interaction data into Fara-7B, which is built on Qwen2.5-VL-7B, a base model chosen for its long context window (up to 128,000 tokens) and its strong ability to connect text instructions to visual elements on a screen. While the data generation required a heavy multi-agent system, Fara-7B itself is a single model, showing that a small model can effectively learn advanced behaviors without needing complex scaffolding at runtime.The training process relied on supervised fine-tuning, where the model learns by mimicking the successful examples generated by the synthetic pipeline.Looking forwardWhile the current version was trained on static datasets, future iterations will focus on making the model smarter, not necessarily bigger. "Moving forward, we’ll strive to maintain the small size of our models," Lara said. "Our ongoing research is focused on making agentic models smarter and safer, not just larger." This includes exploring techniques like reinforcement learning (RL) in live, sandboxed environments, which would allow the model to learn from trial and error in real-time.Microsoft has made the model available on Hugging Face and Microsoft Foundry under an MIT license. However, Lara cautions that while the license allows for commercial use, the model is not yet production-ready. "You can freely experiment and prototype with Fara‑7B under the MIT license," he says, "but it’s best suited for pilots and proofs‑of‑concept rather than mission‑critical deployments."

#machine learning #editors pick #math #neural network #triton #softmax

All you need to know about a fast, readable and PyTorch-ready softmax kernel!
The post Learning Triton One Kernel at a Time: Softmax appeared first on Towards Data Science.

#artificial intelligence #deep dives #deep learning #llm #reasoning #hrm

A 27M-parameter model just outperformed giants like DeepSeek R1, o3-mini, and Claude 3.7 on reasoning tasks
The post Your Next ‘Large’ Language Model Might Not Be Large After All appeared first on Towards Data Science.

#science #science / physics and math

Recent findings reveal that even simple pricing algorithms can make things more expensive.

Louvre heist = AI's weakness, hidden lion roar, voice AI NPCs, and more...

#ai #datadecisionmakers

Large language models (LLMs) have astounded the world with their capabilities, yet they remain plagued by unpredictability and hallucinations – confidently outputting incorrect information. In high-stakes domains like finance, medicine or autonomous systems, such unreliability is unacceptable. Enter Lean4, an open-source programming language and interactive theorem prover becoming a key tool to inject rigor and certainty into AI systems. By leveraging formal verification, Lean4 promises to make AI safer, more secure and deterministic in its functionality. Let's explore how Lean4 is being adopted by AI leaders and why it could become foundational for building trustworthy AI.What is Lean4 and why it mattersLean4 is both a programming language and a proof assistant designed for formal verification. Every theorem or program written in Lean4 must pass a strict type-checking by Lean’s trusted kernel, yielding a binary verdict: A statement either checks out as correct or it doesn’t. This all-or-nothing verification means there’s no room for ambiguity – a property or result is proven true or it fails. Such rigorous checking “dramatically increases the reliability” of anything formalized in Lean4. In other words, Lean4 provides a framework where correctness is mathematically guaranteed, not just hoped for.This level of certainty is precisely what today’s AI systems lack. Modern AI outputs are generated by complex neural networks with probabilistic behavior. Ask the same question twice and you might get different answers. By contrast, a Lean4 proof or program will behave deterministically – given the same input, it produces the same verified result every time. This determinism and transparency (every inference step can be audited) make Lean4 an appealing antidote to AI’s unpredictability.Key advantages of Lean4’s formal verification:Precision and reliability: Formal proofs avoid ambiguity through strict logic, ensuring each reasoning step is valid and results are correct.Systematic verification: Lean4 can formally verify that a solution meets all specified conditions or axioms, acting as an objective referee for correctness.Transparency and reproducibility: Anyone can independently check a Lean4 proof, and the outcome will be the same – a stark contrast to the opaque reasoning of neural networks.In essence, Lean4 brings the gold standard of mathematical rigor to computing and AI. It enables us to turn an AI’s claim (“I found a solution”) into a formally checkable proof that is indeed correct. This capability is proving to be a game-changer in several aspects of AI development.Lean4 as a safety net for LLMsOne of the most exciting intersections of Lean4 and AI is in improving LLM accuracy and safety. Research groups and startups are now combining LLMs’ natural language prowess with Lean4’s formal checks to create AI systems that reason correctly by construction.Consider the problem of AI hallucinations, when an AI confidently asserts false information. Instead of adding more opaque patches (like heuristic penalties or reinforcement tweaks), why not prevent hallucinations by having the AI prove its statements? That’s exactly what some recent efforts do. For example, a 2025 research framework called Safe uses Lean4 to verify each step of an LLM’s reasoning. The idea is simple but powerful: Each step in the AI’s chain-of-thought (CoT) translates the claim into Lean4’s formal language and the AI (or a proof assistant) provides a proof. If the proof fails, the system knows the reasoning was flawed – a clear indicator of a hallucination. This step-by-step formal audit trail dramatically improves reliability, catching mistakes as they happen and providing checkable evidence for every conclusion. The approach that has shown “significant performance improvement while offering interpretable and verifiable evidence” of correctness.Another prominent example is Harmonic AI, a startup co-founded by Vlad Tenev (of Robinhood fame) that tackles hallucinations in AI. Harmonic’s system, Aristotle, solves math problems by generating Lean4 proofs for its answers and formally verifying them before responding to the user. “[Aristotle] formally verifies the output… we actually do guarantee that there’s no hallucinations,” Harmonic’s CEO explains. In practical terms, Aristotle writes a solution in Lean4’s language and runs the Lean4 checker. Only if the proof checks out as correct does it present the answer. This yields a “hallucination-free” math chatbot – a bold claim, but one backed by Lean4’s deterministic proof checking.Crucially, this method isn’t limited to toy problems. Harmonic reports that Aristotle achieved a gold-medal level performance on the 2025 International Math Olympiad problems, the key difference that its solutions were formally verified, unlike other AI models that merely gave answers in English. In other words, where tech giants Google and OpenAI also reached human-champion level on math questions, Aristotle did so with a proof in hand. The takeaway for AI safety is compelling: When an answer comes with a Lean4 proof, you don’t have to trust the AI – you can check it.This approach could be extended to many domains. We could imagine an LLM assistant for finance that provides an answer only if it can generate a formal proof that it adheres to accounting rules or legal constraints. Or, an AI scientific adviser that outputs a hypothesis alongside a Lean4 proof of consistency with known physics laws. The pattern is the same – Lean4 acts as a rigorous safety net, filtering out incorrect or unverified results. As one AI researcher from Safe put it, “the gold standard for supporting a claim is to provide a proof,” and now AI can attempt exactly that.Building secure and reliable systems with Lean4Lean4’s value isn’t confined to pure reasoning tasks; it’s also poised to revolutionize software security and reliability in the age of AI. Bugs and vulnerabilities in software are essentially small logic errors that slip through human testing. What if AI-assisted programming could eliminate those by using Lean4 to verify code correctness?In formal methods circles, it’s well known that provably correct code can “eliminate entire classes of vulnerabilities [and] mitigate critical system failures.” Lean4 enables writing programs with proofs of properties like “this code never crashes or exposes data.” However, historically, writing such verified code has been labor-intensive and required specialized expertise. Now, with LLMs, there’s an opportunity to automate and scale this process. Researchers have begun creating benchmarks like VeriBench to push LLMs to generate Lean4-verified programs from ordinary code. Early results show today’s models are not yet up to the task for arbitrary software – in one evaluation, a state-of-the-art model could fully verify only ~12% of given programming challenges in Lean4. Yet, an experimental AI “agent” approach (iteratively self-correcting with Lean feedback) raised that success rate to nearly 60%. This is a promising leap, hinting that future AI coding assistants might routinely produce machine-checkable, bug-free code.The strategic significance for enterprises is huge. Imagine being able to ask an AI to write a piece of software and receiving not just the code, but a proof that it is secure and correct by design. Such proofs could guarantee no buffer overflows, no race conditions and compliance with security policies. In sectors like banking, healthcare or critical infrastructure, this could drastically reduce risks. It’s telling that formal verification is already standard in high-stakes fields (that is, verifying the firmware of medical devices or avionics systems). Harmonic’s CEO explicitly notes that similar verification technology is used in “medical devices and aviation” for safety – Lean4 is bringing that level of rigor into the AI toolkit.Beyond software bugs, Lean4 can encode and verify domain-specific safety rules. For instance, consider AI systems that design engineering projects. A LessWrong forum discussion on AI safety gives the example of bridge design: An AI could propose a bridge structure, and formal systems like Lean can certify that the design obeys all the mechanical engineering safety criteria. The bridge’s compliance with load tolerances, material strength and design codes becomes a theorem in Lean, which, once proved, serves as an unimpeachable safety certificate. The broader vision is that any AI decision impacting the physical world – from circuit layouts to aerospace trajectories – could be accompanied by a Lean4 proof that it meets specified safety constraints. In effect, Lean4 adds a layer of trust on top of AI outputs: If the AI can’t prove it’s safe or correct, it doesn’t get deployed.From big tech to startups: A growing movementWhat started in academia as a niche tool for mathematicians is rapidly becoming a mainstream pursuit in AI. Over the last few years, major AI labs and startups alike have embraced Lean4 to push the frontier of reliable AI:OpenAI and Meta (2022): Both organizations independently trained AI models to solve high-school olympiad math problems by generating formal proofs in Lean. This was a landmark moment, demonstrating that large models can interface with formal theorem provers and achieve non-trivial results. Meta even made their Lean-enabled model publicly available for researchers. These projects showed that Lean4 can work hand-in-hand with LLMs to tackle problems that demand step-by-step logical rigor.Google DeepMind (2024): DeepMind’s AlphaProof system proved mathematical statements in Lean4 at roughly the level of an International Math Olympiad silver medalist. It was the first AI to reach “medal-worthy” performance on formal math competition problems – essentially confirming that AI can achieve top-tier reasoning skills when aligned with a proof assistant. AlphaProof’s success underscored that Lean4 isn’t just a debugging tool; it’s enabling new heights of automated reasoning.Startup ecosystem: The aforementioned Harmonic AI is a leading example, raising significant funding ($100M in 2025) to build “hallucination-free” AI by using Lean4 as its backbone. Another effort, DeepSeek, has been releasing open-source Lean4 prover models aimed at democratizing this technology. We’re also seeing academic startups and tools – for example, Lean-based verifiers being integrated into coding assistants, and new benchmarks like FormalStep and VeriBench guiding the research community.Community and education: A vibrant community has grown around Lean (the Lean Prover forum, mathlib library), and even famous mathematicians like Terence Tao have started using Lean4 with AI assistance to formalize cutting-edge math results. This melding of human expertise, community knowledge and AI hints at the collaborative future of formal methods in practice.All these developments point to a convergence: AI and formal verification are no longer separate worlds. The techniques and learnings are cross-pollinating. Each success – whether it’s solving a math theorem or catching a software bug – builds confidence that Lean4 can handle more complex, real-world problems in AI safety and reliability.Challenges and the road aheadIt’s important to temper excitement with a dose of reality. Lean4’s integration into AI workflows is still in its early days, and there are hurdles to overcome:Scalability: Formalizing real-world knowledge or large codebases in Lean4 can be labor-intensive. Lean requires precise specification of problems, which isn’t always straightforward for messy, real-world scenarios. Efforts like auto-formalization (where AI converts informal specs into Lean code) are underway, but more progress is needed to make this seamless for everyday use.Model limitations: Current LLMs, even cutting-edge ones, struggle to produce correct Lean4 proofs or programs without guidance. The failure rate on benchmarks like VeriBench shows that generating fully verified solutions is a difficult challenge. Advancing AI’s capabilities to understand and generate formal logic is an active area of research – and success isn’t guaranteed to be quick. However, every improvement in AI reasoning (like better chain-of-thought or specialized training on formal tasks) is likely to boost performance here.User expertise: Utilizing Lean4 verification requires a new mindset for developers and decision-makers. Organizations may need to invest in training or new hires who understand formal methods. The cultural shift to insist on proofs might take time, much like the adoption of automated testing or static analysis did in the past. Early adopters will need to showcase wins to convince the broader industry of the ROI.Despite these challenges, the trajectory is set. As one commentator observed, we are in a race between AI’s expanding capabilities and our ability to harness those capabilities safely. Formal verification tools like Lean4 are among the most promising means to tilt the balance toward safety. They provide a principled way to ensure AI systems do exactly what we intend, no more and no less, with proofs to show it.Toward provably safe AIIn an era when AI systems are increasingly making decisions that affect lives and critical infrastructure, trust is the scarcest resource. Lean4 offers a path to earn that trust not through promises, but through proof. By bringing formal mathematical certainty into AI development, we can build systems that are verifiably correct, secure, and aligned with our objectives.From enabling LLMs to solve problems with guaranteed accuracy, to generating software free of exploitable bugs, Lean4’s role in AI is expanding from a research curiosity to a strategic necessity. Tech giants and startups alike are investing in this approach, pointing to a future where saying “the AI seems to be correct” is not enough – we will demand “the AI can show it’s correct.”For enterprise decision-makers, the message is clear: It’s time to watch this space closely. Incorporating formal verification via Lean4 could become a competitive advantage in delivering AI products that customers and regulators trust. We are witnessing the early steps of AI’s evolution from an intuitive apprentice to a formally validated expert. Lean4 is not a magic bullet for all AI safety concerns, but it is a powerful ingredient in the recipe for safe, deterministic AI that actually does what it’s supposed to do – nothing more, nothing less, nothing incorrect.As AI continues to advance, those who combine its power with the rigor of formal proof will lead the way in deploying systems that are not only intelligent, but provably reliable.Dhyey Mavani is accelerating generative AI at LinkedIn. Read more from our guest writers. Or, consider submitting a post of your own! See our guidelines here.

This article is divided into two parts; they are: • Architecture and Training of BERT • Variations of BERT BERT is an encoder-only model.

« 1...1011121314...189»
×