Latest AI News & Updates

#data engineering #data science #deep dives #json #programming #python

Benchmarking JSON libraries for large payloads
The post JSON Parsing for Large Payloads: Balancing Speed, Memory, and Scalability appeared first on Towards Data Science.

What if I told you that a powerful vibe coding workflow on par with Claude Code can cost you less than $10? Let me prove it.

#ai

Mistral AI, Europe's most prominent artificial intelligence startup, is releasing its most ambitious product suite to date: a family of 10 open-source models designed to run everywhere from smartphones and autonomous drones to enterprise cloud systems, marking a major escalation in the company's challenge to both U.S. tech giants and surging Chinese competitors.The Mistral 3 family, launching today, includes a new flagship model called Mistral Large 3 and a suite of smaller "Ministral 3" models optimized for edge computing applications. All models will be released under the permissive Apache 2.0 license, allowing unrestricted commercial use — a sharp contrast to the closed systems offered by OpenAI, Google, and Anthropic.The release is a pointed bet by Mistral that the future of artificial intelligence lies not in building ever-larger proprietary systems, but in offering businesses maximum flexibility to customize and deploy AI tailored to their specific needs, often using smaller models that can run without cloud connectivity."The gap between closed and open source is getting smaller, because more and more people are contributing to open source, which is great," Guillaume Lample, Mistral's chief scientist and co-founder, said in an exclusive interview with VentureBeat. "We are catching up fast."Why Mistral is choosing flexibility over frontier performance in the AI raceThe strategic calculus behind Mistral 3 diverges sharply from recent model releases by industry leaders. While OpenAI, Google, and Anthropic have focused recent launches on increasingly capable "agentic" systems — AI that can autonomously execute complex multi-step tasks — Mistral is prioritizing breadth, efficiency, and what Lample calls "distributed intelligence."Mistral Large 3, the flagship model, employs a Mixture of Experts architecture with 41 billion active parameters drawn from a total pool of 675 billion parameters. The model can process both text and images, handles context windows up to 256,000 tokens, and was trained with particular emphasis on non-English languages — a rarity among frontier AI systems."Most AI labs focus on their native language, but Mistral Large 3 was trained on a wide variety of languages, making advanced AI useful for billions who speak different native languages," the company said in a statement reviewed ahead of the announcement.But the more significant departure lies in the Ministral 3 lineup: nine compact models across three sizes (14 billion, 8 billion, and 3 billion parameters) and three variants tailored for different use cases. Each variant serves a distinct purpose: base models for extensive customization, instruction-tuned models for general chat and task completion, and reasoning-optimized models for complex logic requiring step-by-step deliberation.The smallest Ministral 3 models can run on devices with as little as 4 gigabytes of video memory using 4-bit quantization — making frontier AI capabilities accessible on standard laptops, smartphones, and embedded systems without requiring expensive cloud infrastructure or even internet connectivity. This approach reflects Mistral's belief that AI's next evolution will be defined not by sheer scale, but by ubiquity: models small enough to run on drones, in vehicles, in robots, and on consumer devices.How fine-tuned small models beat expensive large models for enterprise customersLample's comments reveal a business model fundamentally different from that of closed-source competitors. Rather than competing primarily on benchmark performance, Mistral is targeting enterprise customers frustrated by the cost and inflexibility of proprietary systems."Sometimes customers say, 'Is there a use case where the best closed-source model isn't working?' If that's the case, then they're essentially stuck," Lample explained. "There's nothing they can do. It's the best model available, and it's not working out of the box."This is where Mistral's approach diverges. When a generic model fails, the company deploys engineering teams to work directly with customers, analyzing specific problems, creating synthetic training data, and fine-tuning smaller models to outperform larger general-purpose systems on narrow tasks."In more than 90% of cases, a small model can do the job, especially if it's fine-tuned. It doesn't have to be a model with hundreds of billions of parameters, just a 14-billion or 24-billion parameter model," Lample said. "So it's not only much cheaper, but also faster, plus you have all the benefits: you don't need to worry about privacy, latency, reliability, and so on."The economic argument is compelling. Multiple enterprise customers have approached Mistral after building prototypes with expensive closed-source models, only to find deployment costs prohibitive at scale, according to Lample."They come back to us a couple of months later because they realize, 'We built this prototype, but it's way too slow and way too expensive,'" he said.Where Mistral 3 fits in the increasingly crowded open-source AI marketMistral's release comes amid fierce competition on multiple fronts. OpenAI recently released GPT-5.1 with enhanced agentic capabilities. Google launched Gemini 3 with improved multimodal understanding. Anthropic released Opus 4.5 on the same day as this interview, with similar agent-focused features.But Lample argues those comparisons miss the point. "It's a little bit behind. But I think what matters is that we are catching up fast," he acknowledged regarding performance against closed models. "I think we are maybe playing a strategic long game."That long game involves a different competitive set: primarily open-source models from Chinese companies like DeepSeek and Alibaba's Qwen series, which have made remarkable strides in recent months.Mistral differentiates itself through multilingual capabilities that extend far beyond English or Chinese, multimodal integration handling both text and images in a unified model, and what the company characterizes as superior customization through easier fine-tuning."One key difference with the models themselves is that we focused much more on multilinguality," Lample said. "If you look at all the top models from [Chinese competitors], they're all text-only. They have visual models as well, but as separate systems. We wanted to integrate everything into a single model."The multilingual emphasis aligns with Mistral's broader positioning as a European AI champion focused on digital sovereignty — the principle that organizations and nations should maintain control over their AI infrastructure and data.Building beyond models: Mistral's full-stack enterprise AI platform strategyMistral 3's release builds on an increasingly comprehensive enterprise AI platform that extends well beyond model development. The company has assembled a full-stack offering that differentiates it from pure model providers.Recent product launches include Mistral Agents API, which combines language models with built-in connectors for code execution, web search, image generation, and persistent memory across conversations; Magistral, the company's reasoning model designed for domain-specific, transparent, and multilingual reasoning; and Mistral Code, an AI-powered coding assistant bundling models, an in-IDE assistant, and local deployment options with enterprise tooling.The consumer-facing Le Chat assistant has been enhanced with Deep Research mode for structured research reports, voice capabilities, and Projects for organizing conversations into context-rich folders. More recently, Le Chat gained a connector directory with 20+ enterprise integrations powered by the Model Context Protocol (MCP), spanning tools like Databricks, Snowflake, GitHub, Atlassian, Asana, and Stripe.In October, Mistral unveiled AI Studio, a production AI platform providing observability, agent runtime, and AI registry capabilities to help enterprises track output changes, monitor usage, run evaluations, and fine-tune models using proprietary data.Mistral now positions itself as a full-stack, global enterprise AI company, offering not just models but an application-building layer through AI Studio, compute infrastructure, and forward-deployed engineers to help businesses realize return on investment.Why open source AI matters for customization, transparency and sovereigntyMistral's commitment to open-source development under permissive licenses is both an ideological stance and a competitive strategy in an AI landscape increasingly dominated by closed systems.Lample elaborated on the practical benefits: "I think something that people don't realize — but our customers know this very well — is how much better any model can actually improve if you fine tune it on the task of interest. There's a huge gap between a base model and one that's fine-tuned for a specific task, and in many cases, it outperforms the closed-source model."The approach enables capabilities impossible with closed systems: organizations can fine-tune models on proprietary data that never leaves their infrastructure, customize architectures for specific workflows, and maintain complete transparency into how AI systems make decisions — critical for regulated industries like finance, healthcare, and defense.This positioning has attracted government and public sector partnerships. The company launched "AI for Citizens" in July 2025, an initiative to "help States and public institutions strategically harness AI for their people by transforming public services" and has secured strategic partnerships with France's army and job agency, Luxembourg's government, and various European public sector organizations.Mistral's transatlantic AI collaboration goes beyond European bordersWhile Mistral is frequently characterized as Europe's answer to OpenAI, the company views itself as a transatlantic collaboration rather than a purely European venture. The company has teams across both continents, with co-founders spending significant time with customers and partners in the United States, and these models are being trained in partnerships with U.S.-based teams and infrastructure providers.This transatlantic positioning may prove strategically important as geopolitical tensions around AI development intensify. The recent ASML investment, a €1.7 billion ($1.5 billion) funding round led by the Dutch semiconductor equipment manufacturer, signals deepening collaboration across the Western semiconductor and AI value chain at a moment when both Europe and the United States are seeking to reduce dependence on Chinese technology.Mistral's investor base reflects this dynamic: the Series C round included participation from U.S. firms Andreessen Horowitz, General Catalyst, Lightspeed, and Index Ventures alongside European investors like France's state-backed Bpifrance and global players like DST Global and Nvidia.Founded in May 2023 by former Google DeepMind and Meta researchers, Mistral has raised roughly $1.05 billion (€1 billion) in funding. The company was valued at $6 billion in a June 2024 Series B, then more than doubled its valuation in a September Series C.Can customization and efficiency beat raw performance in enterprise AI?The Mistral 3 release crystallizes a fundamental question facing the AI industry: Will enterprises ultimately prioritize the absolute cutting-edge capabilities of proprietary systems, or will they choose open, customizable alternatives that offer greater control, lower costs, and independence from big tech platforms?Mistral's answer is unambiguous. The company is betting that as AI moves from prototype to production, the factors that matter most shift dramatically. Raw benchmark scores matter less than total cost of ownership. Slight performance edges matter less than the ability to fine-tune for specific workflows. Cloud-based convenience matters less than data sovereignty and edge deployment.It's a wager with significant risks. Despite Lample's optimism about closing the performance gap, Mistral's models still trail the absolute frontier. The company's revenue, while growing, reportedly remains modest relative to its nearly $14 billion valuation. And competition intensifies from both well-funded Chinese rivals making remarkable open-source progress and U.S. tech giants increasingly offering their own smaller, more efficient models.But if Mistral is right — if the future of AI looks less like a handful of cloud-based oracles and more like millions of specialized systems running everywhere from factory floors to smartphones — then the company has positioned itself at the center of that transformation.The release of Mistral 3 is the most comprehensive expression yet of that vision: 10 models, spanning every size category, optimized for every deployment scenario, available to anyone who wants to build with them.Whether "distributed intelligence" becomes the industry's dominant paradigm or remains a compelling alternative serving a narrower market will determine not just Mistral's fate, but the broader question of who controls the AI future — and whether that future will be open.For now, the race is on. And Mistral is betting it can win not by building the biggest model, but by building everywhere else.

The value of a modern company isn't in its firewalls; it's in its terabytes of proprietary, labeled data and the predictive models built upon them.

#school of engineering #mit schwarzman college of computing #civil and environmental engineering #electrical engineering and computer science (eecs) #mechanical engineering #computer science and artificial intelligence laboratory (csail) #idss #laboratory for information and decision systems (lids) #computer science and technology #robotics #artificial intelligence #machine learning #safety #research

MIT CSAIL and LIDS researchers developed a mathematically grounded system that lets soft robots deform, adapt, and interact with people and objects, without violating safety limits.

#ai

While artificial intelligence has stormed into law firms and accounting practices with billion-dollar startups like Harvey leading the charge, the global consulting industry—a $250 billion behemoth—has remained stubbornly analog. A London-based startup founded by former McKinsey consultants is betting $2 million that it can crack open this resistant market, one Excel spreadsheet at a time.Ascentra Labs announced Tuesday that it has closed a $2 million seed round led by NAP, a Berlin-based venture capital firm formerly known as Cavalry Ventures. The funding comes with participation from notable founder-angels including Alan Chang, chief executive of Fuse and former chief revenue officer at Revolut, and Fredrik Hjelm, chief executive of European e-scooter company Voi.The investment is modest by the standards of enterprise AI — a sector that has seen funding rounds routinely reach into the hundreds of millions. But Ascentra's founders argue that their focused approach to a narrow but painful problem could give them an edge in a market where broad AI solutions have repeatedly failed to gain traction.Consultants spend countless hours on Excel survey analysis that even top firms haven't automatedParitosh Devbhandari, Ascentra's co-founder and chief executive, spent years at McKinsey & Company, including a stint at QuantumBlack, the firm's AI and advanced analytics division. He knows intimately the late nights consultants spend wrestling with survey data—the kind of quantitative research that forms the backbone of private equity due diligence."Before starting the company, I was working at McKinsey, specifically on the private equity team," Devbhandari explained in an exclusive interview with VentureBeat. The work, he said, involves analyzing encoded survey responses from customers, suppliers, and market participants during potential acquisitions."Consultants typically spend a lot of time doing this in Excel," he said. "One of the things that surprised me, having worked at a couple of different places, is that the workflow — even at the best firms — really isn't that different from some of the boutiques. I always expected there would be some smarter way of doing things, and often there just isn't."That gap between expectation and reality became the foundation for Ascentra. The company's platform ingests raw survey data files and outputs formatted Excel workbooks complete with traceable formulas — the kind of deliverable a junior associate would spend hours constructing manually.AI has transformed legal work but consulting presents unique technical challenges that have blocked adoptionThe disparity between AI adoption in law versus consulting raises an obvious question: if the consulting market is so large and the workflows so manual, why hasn't venture capital flooded the space the way it has legal tech?Devbhandari offered a frank assessment. "It's not like people haven't tried," he said. "The top of the funnel in our space is crowded. When we speak to our consulting clients, the partners say they get another pitch deck in their LinkedIn inbox or email every week—sometimes several. There are plenty of people trying."The barriers, he argued, are structural. Professional services firms move slowly on technology adoption, demanding extensive security credentials and customer references before granting even a pilot opportunity. "I think that's where 90% of startups in professional services, writ large, fall down," he said.But consulting presents unique technical challenges beyond the sales cycle. Unlike legal work, which largely involves text documents that modern large language models handle well, consulting spans multiple data modalities — PowerPoint presentations, Excel spreadsheets, Word documents — with information that can be tabular, graphical, or textual."You can have multiple formats of Excel in itself," Devbhandari noted. "And that's a big contrast to the legal space, where you could have a multi-purpose AI agent, or collection of agents, which can actually do a lot of the tasks that lawyers do day to day. Consulting is the opposite of that."Ascentra's private equity focus reflects a calculated bet on repeatable workflowsAscentra's strategy hinges on extreme specificity. Rather than attempting to automate the full spectrum of consulting work, the company focuses exclusively on survey analysis within private equity due diligence — a niche within a niche.The logic is both technical and commercial. Private equity work tends to be more standardized than other consulting engagements, with similar analyses recurring across deals. That repeatability makes automation feasible. It also positions Ascentra against a less formidable competitive set: even the largest consulting firms, Devbhandari claimed, lack dedicated internal tools for this particular workflow."Survey analysis automation is so specific that even the biggest and best firms haven't developed anything in-house for it," he said.The company claims that three of the world's top five consulting firms now use its platform, with early adopters reporting time savings of 60 to 80 percent on active due diligence projects. But there's a notable caveat: Ascentra cannot publicly name any of these clients."It's a very private industry, so at the moment, we can't announce any clients publicly," Devbhandari acknowledged. "What I can say is that we're working with three of the top five consulting firms. We've passed pilots at multiple organizations and have submitted business cases for enterprise rollouts."Eliminating AI hallucinations becomes critical when billion-dollar deals hang in the balanceFor an AI company selling into quantitative workflows, accuracy is existential. Consultants delivering analysis to private equity clients face enormous pressure to be precise—a single error in a financial model can undermine credibility and, potentially, billion-dollar investment decisions.Devbhandari described this as Ascentra's central design challenge. "Consultants require a very, very high degree of fidelity when they're doing their analysis," he said. "So with quantitative data, even if it's 95% accurate, they will revert to Excel because they know it, they trust it, and they don't want there to be any margin for error."Ascentra's technical approach attempts to address this by limiting where AI models operate within the workflow. The company uses GPT-based models from OpenAI to interpret and ingest incoming data, but the actual analysis relies on deterministic Python scripts that produce consistent, verifiable outputs."What's different is the steps that follow are deterministic," Devbhandari explained. "There's no room for error. There's no hallucinations, and the Excel writer that we've connected to the product on the back end converts this analysis into Excel formula, which are live and traceable, so consultants can get that assurance that they can follow along with the maths."Whether this hybrid approach delivers on its promise of eliminating hallucinations while maintaining useful AI capabilities will be tested as the platform scales across more complex use cases and client environments.Enterprise security certifications give Ascentra an edge over less prepared competitorsSelling software to major consulting firms requires clearing an unusually high security bar. These organizations handle sensitive client data across industries, and their vendor security assessments can take months to complete.Ascentra invested early in obtaining enterprise-grade certifications, a strategic choice that Devbhandari framed as essential table stakes. The company has achieved SOC 2 Type II and ISO 27001 certifications and claims to be under audit for ISO 42001, an emerging standard for AI management systems.Data handling policies also reflect the sensitivity of the target market. Client data is deleted within 30 to 45 days, depending on contractual terms, and Ascentra does not use customer data to train its models.There's also an argument that survey data carries somewhat lower sensitivity than other consulting materials. "Survey data is unique in consulting data because it's collected during the course of a project, and it is market data," Devbhandari noted. "You interview people in the market, and you collect a bunch of data in an Excel, as opposed to—you look at Rogo or some of the other finance AI startups—they use client data, so financials, which is confidential and strictly non-public."Per-project pricing aligns with how consulting firms actually spend moneyAscentra's pricing model departs from the subscription-based approach that dominates enterprise software. The company charges on a per-project basis, a structure Devbhandari said aligns with how consulting firms allocate budgets."Project budgets are in consulting set on a per project basis," he explained. "You'll have central budgets which are for things like Microsoft, right, very central things that every team will use all of the time. And then you have project budgets which are for the teams that are using specific resources, teams or products nowadays."This approach may ease initial adoption by avoiding the need for central IT procurement approval, but it also introduces revenue unpredictability. The company's success will depend on converting project-level usage into broader enterprise relationships—a path Devbhandari suggested is already underway through submitted business cases for enterprise rollouts.AI may not eliminate consulting jobs, but it will fundamentally transform what consultants doPerhaps the most interesting tension in Devbhandari's vision concerns what AI ultimately means for consulting employment. He pushed back on predictions that AI will eliminate consulting jobs while simultaneously describing an industry on the cusp of fundamental transformation."People love to talk about how AI is going to remove the need for consultants, and I disagree," he said. "Yes, the role will change, but I don't think the industry goes away. I think the best solutions will come from people within the industry building products around the work they know."Yet he also painted a picture of dramatic change. "At the moment, you have a big intake of graduates who just do—for the most part, you know, they have the strategic work as part of what they do, but they also have a lot of work in Excel and PowerPoint. I think in a few years' time, we'll look back at these times and think, you know, very, very different."The honest answer, he acknowledged, is that no one truly knows how this plays out. "I don't think even AI leaders truly know what that looks like yet," he said of whether productivity gains will translate to more work or fewer workers.Ascentra plans to use seed funding to expand its U.S. presence and go-to-market teamThe $2 million will primarily fund Ascentra's expansion into the United States, where more than 80 percent of its customers are already based. Devbhandari plans to relocate there personally as the company builds out go-to-market capabilities."One of the things that we've really noticed is that with consulting being an American industry, and I think America being a great place for innovation and trying new things, we've definitely drawn ourselves to the U.S.," he said. "American hires are very expensive, and I'm sure that a lot of the raise will go towards that."The seed round represents a bet by NAP on what its co-founder Stefan Walter called an overdue disruption. "While most knowledge work has been reshaped by new technology, consulting has remained stubbornly manual," Walter said. "AI won't replace consultants, but consultants using Ascentra might."The startup now faces the hard work of converting pilot wins into lasting enterprise contractsAscentra enters 2026 with momentum but no guarantee of success. The company must transform pilot programs at elite firms into sticky enterprise contracts — all while fending off the inevitable well-funded competitors who will flood into the space once the opportunity becomes undeniable. Its deliberately narrow focus on survey analysis provides a defensible beachhead, but expanding into adjacent workflows will require building entirely new products without sacrificing the domain expertise that Devbhandari argues is the company's core advantage.Oliver Thurston, Ascentra's co-founder and chief technology officer, who previously led machine learning at Mathison AI, offered a clear-eyed assessment of the challenge. "Consulting workflows are uniquely complex and difficult to build products around," he said in a statement. "It's not surprising the space hasn't changed yet. This will change though, and there's no doubt that the industry is going to look completely different in five years' time."For now, Ascentra is placing a focused wager: that the consultants who once spent their nights formatting spreadsheets will be the ones who finally bring AI into an industry that has long resisted it. The irony is hard to miss. After years of advising Fortune 500 companies on digital transformation, consulting may finally have to take its own medicine.

#data science #data governance #data pipeline #data quality #python #data contract

Stop your pipelines from breaking on Friday afternoons using simple, open-source validation with Pandera.
The post How to Use Simple Data Contracts in Python for Data Scientists appeared first on Towards Data Science.

This article explores how to transform ChatGPT from a chatbot into a powerful data assistant that streamlines the repetitive, the tedious, and the complex.

#programming #coding #editors pick #python #qr code

A beginner-friendly tutorial exploring the Python "qrcode" Package
The post How to Generate QR Codes in Python appeared first on Towards Data Science.

#ai

Researchers at MiroMind AI and several Chinese universities have released OpenMMReasoner, a new training framework that improves the capabilities of language models in multimodal reasoning.The framework uses a two-stage process. It first refines a base model with a curated dataset in a supervised fine-tuning (SFT) stage. Then, a reinforcement learning (RL) stage guides the model to reason more effectively in tasks that involve both text and visual data. Experiments show that models trained with OpenMMReasoner outperform other leading visual reasoning models, often while being trained on a smaller, higher-quality dataset. The framework and all its assets, including a trained 7B model, are fully open source, providing a reliable foundation for building applications that require traceability and robustness.According to Kaichen Zhang, co-author of a research paper that outlines the new method, OpenMMReasoner offers significant benefits for businesses looking beyond large, closed systems. "A smaller open-source reasoning model has practical advantages: Enterprises can deploy it locally, reduce latency, lower token costs associated with long chains of thought, maintain full control over their data and [it is] fine-tunable to adapt to their specific downstream task," he told VentureBeat.The challenge of transparent multimodal reasoningRecent advances in reinforcement learning with verifiable rewards (RLVR) have significantly improved the reasoning abilities of large language models (LLMs). RLVR trains LLMs to generate chain-of-thought (CoT) tokens (which mimic the reasoning processes humans use) before generating the final answer. This improves the model’s capability to solve complex reasoning tasks such as math and coding. Motivated by this success, researchers have applied similar RL-based methods to large multimodal models (LMMs), showing that the benefits can extend beyond text to improve visual understanding and problem-solving across different modalities.However, a lack of transparency in the training pipeline has been a major barrier. Many studies on multimodal reasoning do not provide detailed information about their data curation and training processes, making it difficult to reproduce their results or understand what makes these models work.“This lack of openness restricts reproducibility and obscures a deeper understanding of how reasoning-capable LMMs are actually built and how their training dynamics evolve,” the researchers note.The OpenMMReasoner recipeOpenMMReasoner addresses this gap with a fully transparent and scalable training recipe built on open-source LMMs. The researchers found it was critical to curate high-quality datasets by scaling data diversity. Although using diverse data sources is important, increasing the diversity of correct answers for the same question was an essential axis for improvement.The first stage of the recipe is a three-step supervised fine-tuning (SFT) pipeline. It begins with data sourcing, where the team collected approximately 103,000 raw question-answer pairs from public datasets covering general visual Q&A and reasoning tasks. Next, they added a data distillation step, using a powerful model (Qwen3-VL-235B-Instruct) to generate new, high-quality reasoning traces for selected questions. (The data will then be used to train a smaller model.)To increase answer diversity, the team generated multiple verified reasoning traces for each question. This expanded the dataset to 583,000 samples. Finally, they implemented a “domain mixing” phase, adding data from mathematical reasoning domains to further generalize the model's capabilities, resulting in a final SFT dataset of 874,000 examples.The second stage is an RL recipe that uses a smaller, 74,000-sample dataset curated from domains like science, math and puzzles. The model is trained with a composite reward function that considers both the correctness of the final answer and the consistency of the output format. To improve efficiency, the process includes a penalty for "overthinking," discouraging the model from generating excessively long answers (a problem with many reasoning models trained through RL, which mistakenly learn to generate overly long reasoning sequences, resulting in excess cost and slower answers).This recipe can provide a blueprint for enterprises training their own models. "For companies with limited domain-specific data, a feasible strategy is to first increase answer diversity for their existing dataset, then use domain mixing to integrate this domain data into a general reasoning recipe like ours," Zhang explained. "This allows the model to acquire strong general-purpose reasoning skills while also adapting to industry-specific tasks, without needing millions of samples."A more efficient and capable reasoning modelAccording to Zhang, the step-by-step process fundamentally changes the reliability of the model's outputs. "Traditional models often 'jump' directly to an answer, which means they explore only a narrow portion of the reasoning space," he said. "In contrast, a reasoning-first approach forces the model to explicitly examine multiple intermediate steps... [allowing it] to traverse much deeper paths and arrive at answers with far more internal consistency."The researchers used the OpenMMReasoner recipe to generate data to fine-tune the Qwen2.5-VL-7B-Instruct open-source vision-language model. The result is a highly capable LMM that consistently outperforms state-of-the-art methods, such as Open Vision Reasoner (OVR), across a wide range of multimodal reasoning benchmarks. The SFT stage alone creates a strong baseline model that achieves superior performance and data efficiency compared to other SFT approaches, despite using a significantly smaller training dataset.The subsequent RL phase further sharpens and stabilizes these abilities, leading to more consistent and improved performance. After RL, the final model achieves state-of-the-art results on several benchmarks, including WeMath, MathVerse and MathVista.One of the key findings was that, as the model improved at multimodal reasoning, it also showed a "gradual emergence of textual reasoning behaviors, suggesting a transfer of reasoning competence from multimodal to purely linguistic domains," the researchers note. This indicates that skills learned in one modality can strengthen performance in another. "Our results show that strengthening multimodal reasoning can even improve text-only mathematical skills—evidence that core logical abilities can transfer across modalities," Zhang said. "Looking ahead, we do expect these methods to extend to video and audio."The researchers also found that token efficiency is crucial. While allowing a model to generate longer reasoning steps can improve performance, excessive tokens reduce efficiency. Their results show that setting a smaller "reasoning budget" can achieve comparable or even better accuracy, an important consideration for deploying cost-effective enterprise applications.By open-sourcing all components of their workflow, the researchers provide a reproducible view of the entire process. For enterprise teams, this transparency is invaluable. "For business leaders concerned about vendor lock-in, hidden biases or opaque data sources, this level of transparency is essential," Zhang stated. "It empowers teams to validate the data, customize the pipeline for new domains and maintain long-term independence from any single provider."

#research #ai & ml #commentary

The debate about open source AI has largely featured open weight models. But that’s a bit like arguing that in the PC era, the most important goal would have been to have Intel open source its chip designs. That might have been useful to some people, but it wouldn’t have created Linux, Apache, or the […]

Agentic AI is changing how we interact with machines.

#ai #data infrastructure

Every data engineering team right now is being asked the same question: "How do we build a chatbot that talks to our data?"The prototypes are deceptively simple. A developer connects GPT-5.1 to a Snowflake schema, asks "What is our revenue?", and watches as the model generates a syntactically perfect SQL query. It feels like magic. But when these systems move from a sandbox to production, the magic collapses. The bot reports $12 million revenue on Monday and $9.5 million on Tuesday, despite the underlying data remaining unchanged.The failure isn't a lack of model intelligence; it is an architectural "context gap." Gen AI models are probabilistic engines trying to interpret rigid, deterministic business logic from raw database schemas. Without a mediation layer to define what "revenue" actually means, the model guesses.Why direct Text-to-SQL agents fail?To understand why a semantic layer is non-negotiable for gen AI, one must dissect the anatomy of a text-to-SQL failure. The issue is rarely invalid syntax; it is semantic ambiguity. When a large language model (LLM) scans a raw database schema, it lacks the "tribal knowledge" inherent to the business, leading to mathematically correct but functionally false results. For example, consider a common scenario at a global logistics retailer. Their business intelligence (BI) dashboard shows 98% on-time delivery. However, their new AI agent querying raw shipping tables reports 92%. The difference? The AI failed to exclude "customer-waived delays" — a filter that exists only in the BI tool, not the database. This 6% gap didn't just break the bot; it broke trust in the data team.The solution: Build a semantic layerRecent empirical evidence reveals the scale of this problem. A 2024 study by semantic data vendor data.world found that, when tasked with generating SQL from raw schemas, GPT-4 achieved a success rate of just 16.7%. When the same model was grounded with a semantic layer — a "Rosetta Stone" defining business logic — accuracy tripled to 54.2%. AtScale, another semantic layer vendor, reported even higher figures — 92.5% accuracy on the TPC-DS benchmark—by enforcing valid join paths and pre-defined metrics. The Enterprise Semantic Layer has evolved from a tool for dashboards into a critical requirement for AI. It is effectively the "metrics API" that stops AI from guessing your business rules. Currently, vendors are racing to standardize this layer. Snowflake, Salesforce, dbt Labs and partners launched the Open Semantic Interchange (OSI), a vendor-neutral spec aimed at making metric/semantic definitions portable across tools and clouds. If OSI sticks, portability becomes the real moat.In the meantime, the big question for data leaders is where to implement this logic. The market has split into two architectural philosophies: Building it close to the database (embedded natively in Snowflake, Databricks or Microsoft Fabric) for simplicity, or using a platform-agnostic layer (like dbt MetricFlow or Cube) for independence.Architecture A: The "headless" strategyThe "headless" (or platform-agnostic) philosophy is built on a single, uncompromising premise: Decoupling. Instead of locking metric definitions inside a specific dashboard or database, this architecture forces you to define them in a neutral middle layer. The goal is simple — define "revenue" once in code, and serve that exact number to Tableau, Excel and AI Agents simultaneously.How it works: Functionally, these systems act as a universal translation engine sitting between your storage and your consumption tools. When an AI agent requests get_metric(churn), the headless layer reads the definition from a YAML configuration, compiles the necessary SQL (automatically handling complex joins, filters and fan-outs) and executes it against your data warehouse.The key players:dbt: dbt Labs has positioned MetricFlow as the industry's query transpiler. It generates optimized SQL at runtime and pushes it down to Snowflake or BigQuery. Cube: Cube ships a semantic layer and also has support for an MCP server so agents can call governed metrics as tools instead of guessing SQL. Interestingly, both dbt Labs and Cube have joined OSI, the vendor-neutral standard launched in 2025 that makes these definitions portable across any tool.Architecture B: The "platform-native" strategyPlatform-native architecture (often called the "walled garden") flips the script by embedding semantic definitions directly into the storage or compute engine. The philosophy here is integration over independence. By keeping the logic next to the data, these platforms offer superior performance and zero-copy access, removing the need for a separate server.How it works: Native execution; instead of a separate translation layer, the database engine itself understands metrics. When you define a semantic model here, it compiles into native database objects. This unlocks high-performance access — where the semantic engine reads directly from storage memory, bypassing standard SQL overhead. It also allows the platform’s native AI assistants to "read" the metadata instantly without external connectors.The key players:Microsoft Fabric (Semantic Link): For teams already standardized on Power BI/Fabric, semantic link minimizes integration overhead for notebooks and copilots. Microsoft's semantic link (SemPy) feature allows Python notebooks to "mount" Power BI datasets as if they were pandas DataFrames, letting data scientists reuse executive dashboard logic instantly. While historically closed, Microsoft is responding to the agent wave: In November 2025, they released a public preview of a Power BI Modeling MCP Server, signaling a move to open up their "walled garden" to external agents.Snowflake and Databricks: Both vendors have aggressively closed the gap. Snowflake (Cortex AI) and Databricks  (Unity Catalog) now support governed, YAML-based metric views. Unlike early iterations that relied on AI inference, these are deterministic definitions that power their internal AI chatbots, ensuring a "single source of truth" within their respective lakehouses.The engineering reality: Why you can't just "move"A common question from leadership is, "We already have business logic in Looker or Power BI. Can't we just export it to a headless layer?" The answer is rarely yes. Migrating business logic is not a copy-paste operation; it is a fundamental data modelling exercise. The logic embedded in these tools relies on proprietary "magic" that standard SQL — and by extension, headless semantic layers — does not perform automatically.Engineers attempting to "lift and shift" usually hit specific architectural walls. For instance, Looker uses a feature called "symmetric aggregates" to automatically prevent revenue from being double-counted when joining multiple tables — a safeguard that vanishes in raw SQL unless you manually re-engineer the join logic. Similarly, Power BI’s DAX language performs dynamic calculations based on the specific "context" of a visual (like a pivot table filter). Recreating this dynamic behavior in static SQL requires writing verbose, complex code to achieve what Power BI does in a single line.This migration friction is the technical debt that must be paid to enable the AI era. Organizations that leave their logic locked in proprietary BI formats effectively wall off their "single source of truth" from their AI agents, forcing developers to duplicate logic in Python and reintroducing the risk of hallucination.Which architecture wins?There is no single "winner" in the battle for the semantic layer. While both approaches solve the accuracy problem, they impose drastically different constraints on your infrastructure. The choice comes down to a trade-off between integration speed and architectural independence. FeatureHeadless / necoupled (dbt, Cube)Platform-Native (Snowflake, Fabric, Databricks)PhilosophyDefine once, serve everywhereUnified lakehouse / direct integrationAI interfaceAPI / tools (REST, GraphQL, MCP)SQL and notebooks (SemPy, Cortex)Lock-inLower (code/YAML portability)Higher (platform objects)Best fitMulti-cloud agents, external appsInternal copilots, single ecosystemThe Verdict: Which architecture should you choose? If you are 90%-plus standardized on a single platform (Power BI/Fabric or Snowflake): Default to platform-native for internal copilots and employee-facing agentsAccept the lock-in trade-off in exchange for zero-integration overheadDesign escape hatch: Keep one "golden metric set" in portable YAML alongside native definitionsIf you are building customer-facing agents or multi-cloud/multi-source systemsStart with headless architecture (dbt MetricFlow or Cube)Treat semantic layer as your "metrics API" — agents call get_metric(), not raw SQLBudget for caching layer (Cube Store or similar) to prevent agent query stormsIf your metrics are trapped in Looker/Power BI/Tableau:Accept this as technical debt that must be paid before agents can safely use your dataStart with 10–20 "tier-0" metrics (revenue, churn, CAC) and manually re-engineer their logic in SQL/YAMLDo NOT try to auto-migrate — symmetric aggregates and DAX context require explicit redesignThe launch of OSI signals a future where this trade-off may diminish. If the industry converges on a truly portable standard, metric definitions could theoretically move from Snowflake to dbt to Tableau without friction. But until that standard matures, the Headless layer offers the most explicit 'API-first' contract for agents that span multiple systems or serve external users, though platform-native layers are rapidly closing this gap with their own agent-oriented tooling.The era of the "dashboard" is yielding to the era of the "agent." To survive the transition, your data stack needs more than just a faster database; it needs explicit, governed business logic that LLMs can consume without guessing.

#ai

AWS is leveraging automated reasoning, which uses math-based verification, to build out new capabilities in its Amazon Bedrock AgentCore platform as the company digs deeper into the agentic AI ecosystem. Announced during its annual re: Invent conference in Las Vegas, AWS is adding three new capabilities to AgentCore: "policy," "evaluations" and "episodic memory." The new features aim to give enterprises more control over agent behavior and performance. AWS also revealed what it calls “a new class of agents," or "frontier agents," that are autonomous, scalable and independent. Swami Sivasubramanian, AWS VP for Agentic AI, told VentureBeat that many of AWS’s new features represent a shift in who becomes a builder. “We are actually on the cusp of a major tectonic transformation with AI, but agentic AI is truly starting to transform what is the art of the possible, and it is going to make this one of the most truly transforming technologies,” Sivasubramanian said. Policy agentsThe new policy capability helps enterprises reinforce guidelines even after the agent has already reasoned its response. AWS VP for AgentCore David Richardson told VentureBeat that the policy tool sits between the agent and the tools it calls, rather than being baked into the agent, as fine-tuning often is. The idea is to prevent an agent from violating enterprise rules and redirect it to re-evaluate its reasoning. Richardson gave the example of a customer service agent: A company would write a policy stating that the agent can grant a refund of up to $100, but for anything higher, the agent would need to bounce the customer to a human. He noted that it remains easy to subvert an agent's reasoning loop through, for instance, prompt injection or poisoned data, leading agents to ignore guardrails. “There are always these prompt injection attacks where people try to subvert the reasoning of the agent to get the agent to do things it shouldn’t do,” Richardson said. “That’s why we implemented the policy outside of the agent, and it works using the automated reasoning capabilities that we’ve spent years building up to help customer define their capabilities.”AWS unveiled Automated Reasoning Checks on Bedrock at last year’s re: Invent. These use neurosymbolic AI, or math-based validation, to prove correctness. The tool applies mathematical proofs to models to confirm that it hasn’t hallucinated. AWS has been leaning heavily into neurosymbolic AI and automated reasoning, pushing for enterprise-grade security and safety in ways that differ from other AI model providers.Episodic memories and evaluationsThe two other new updates to AgentCore, "evaluations" and "episodic memory," also give enterprises a better view of agent performance and give agents episodic memory.An enhancement of AgentCore memory, episodic memory refers to knowledge that agents tap into only occasionally, unlike longer-running preferences, which they have to refer back to constantly. Context window limits hamper some agents, so they sometimes forget information or conversations they haven’t tapped into for a while. “The idea is to help capture information that a user really would wish the agent remembered when they came back," said Richardson. "For example, 'what is their preferred seat on an airplane for family trips?' Or 'what is the sort of price range they're looking for?'"Episodic memory differs from the previously shipped AgentCore memory because, instead of relying on maintaining short- and long-term memory, agents built on AgentCore can recall certain information based on triggers. This can eliminate the need for custom instructions.With AgentCore evaluations, organizations can use 13 pre-built evaluators or write their own. Developers can set alerts to warn them if agents begin to fail quality monitoring.Frontier agentsBut perhaps AWS's strongest push into enterprise agentic AI is the release of frontier agents, or fully automated and independent agents that the company says can act as teammates with little direction. The concept is similar, if not identical, to those of more asynchronous agents from competitors like Google and OpenAI. However, AWS seems to be releasing more than just autonomous coding agents. Sivasubramanian called them a "new class" of agents, "not only a step function change in what you can do today; they move from assisting with individual tasks to complex projects."The first is Kiro, an autonomous coding agent that has been in public preview since July. At the time, Kiro was billed as an alternative to vibe coding platforms like OpenAI’s Codex or Windsurf. Similar to Codex and Google’s myriad asynchronous coding agents, including Jules, Kiro can code, undertake reviews, fix bugs independently and determine the tasks it needs to accomplish. AWS security agent, meanwhile, embeds deep security expertise into applications from the start. The company said in a press release that users “define security standards once and AWS security agent automatically validates them across your applications during its review — helping teams address the risks that matter to their business, not generic checklists.”The AWS DevOps agent will help developers, especially those on call, proactively find system breaks or bugs. It can respond to incidents using its knowledge of the application or service. It also acknowledges the relationships between the application and the tools it taps, such as Amazon CloudWatch, Datadog and Splunk, to trace the root cause of the issue. Enterprises are interested in deploying agents and, eventually, bringing more autonomous agents into their workflows. And, while companies like AWS continue to bolster these agents with security and control, organizations are slowly figuring out how to connect them all. 

#ai

The AI browser wars are heating up. OpenAI and other AI companies like Perplexity have gotten a lot of attention with their new AI-first and agentic browsers. They're being positioned as direct competition to Google, which currently holds a 70% share of the market with its Chrome browser. As the incumbent, Google has been slower to respond to the shift toward AI search — integrating Gemini into Chrome, is widely seen as playing catch-up to competitors that were AI-first from day one.It's understandable, as a $100 billion business is an enormous, unwieldy beast to pivot. That leaves space for the new guys to maneuver, who are essentially starting with blank slates, and free reign for innovation.Enter Neo, released for worldwide general availability today — the next step in Norton’s AI innovation journey, building on its leadership in cyber safety and its bid to deliver the world’s first safe, zero-prompt AI browser. From the beginning, the minds behind Neo made a deliberate choice to focus on a proactive AI assistant rather than chase today’s agentic trends. Even enthusiasts willing to tolerate the risks face too much unpredictability, along with new safety and privacy concerns.Howie Xu, chief AI & innovation officer at Gen, describes Neo as a browser built to help before you ask — delivering on-page, in-flow support through summaries, reminders, and context-aware suggestions without prompts or extra steps."It's like having a highly intelligent assistant sitting next to me, helping me absorb and process information much more broadly, much faster, much deeper," Xu says. "That assistant is there when you're reading, when you're researching, when you're working on an online project. And based on your interests and browsing, your assistant can help you at every step."Borrowing from Norton's unique consumer security expertise, privacy and safety has also been integrated from the ground up."What makes us unique is that we're giving people both peace of mind and AI functionality at the same time," Xu explains. "Norton’s roots are in security. We’re the only game in town that built an AI native browser from the ground up with safety and privacy at its core —one that won’t exploit or use your data for training.The zero-prompt differenceComet (Perplexity) and Atlas (OpenAI) were built by chat-first companies that assume users will actively ask questions. But getting value from AI takes cognitive effort: you need to know what to ask, shift into “question mode,” and understand what the model can actually do. Asking a question isn’t the hard part; realizing what to ask requires meta-cognition — awareness of what you don’t know — which makes turning to ChatGPT in the middle of browsing feel harder than it should.Neo takes the opposite approach. Instead of waiting for you to prompt it, it acts first — offering summaries, reminders, relevant news, and even questions you’re likely to explore. "Based on my browsing interests, Neo reminds me of events I might want to attend, surfaces personalized news, and presents pre-generated questions that I actually want to explore," Xu explains. "In other words, I’ve never had to formulate a single prompt — I’m simply clicking on insights the AI has already anticipated for me as if I had been prompting.”Because most people don’t know the boundaries of AI technology or how to phrase effective prompts, expecting them to drive the interaction is unrealistic for many people."We decided to shift the burden away from people. You can still ask questions, of course, but we’re designing for those who want less cognitive load and prefer AI to take the first step," he says. Much like the recommendations that surface on any news or retail site, Neo leverages browsing context to surface the right content at the right moment.Neo can summarize a page and anticipate questions based on your interests and behaviors. With permission, it can also create detailed reminders — for example, noticing repeat visits to Formula 1 websites and prompting you about upcoming races. Control stays with the person using Neo: if an interest fades, they can remove it from Neo’s Configurable Memory.Because Neo’s browsing history and preferences are stored locally and securely, it can customize prompts, insights, and suggestions — from calendar nudges to news recommendations to suggested questions in the Neo Chat interface. The result is an AI-powered browser that gives people the benefits of AI without typing prompts. Inline actions like “Summary,” “Add to calendar?,” “Resume where you left off,” and “Price dropped” make browsing feel faster and lighter, without extra steps.A calm-by-design experience grounded in security“Calm by design” has guided Neo’s development, and for Xu that comes down to three things: control, privacy, and security, all within a clean, streamlined experience that makes browsing faster and easier.Rooted in Norton’s decades of security expertise, Neo’s calm experience starts with privacy and protection. Xu views it as the bedrock of Neo’s approach: the company never knows what you’re doing, because all personal data stays on the device unless explicitly permitted otherwise. Norton-backed security practices suppress prompt-injection risks common in other AI browsers, local processing keeps sensitive information contained, and scoped sync ensures only user-approved context carries across devices.Norton also brings deep web intelligence: decades of scanning the vast majority of the internet and evolving antivirus capabilities that now understand both static and runtime web content. That real-time insight allows Neo’s built in antivirus, anti-phishing, and anti-scam technology to detect and shut down malicious behavior and content the moment it appears."When we think about calm, what we really mean is delivering value in a consistent way, in a reliable way, in a way that people can predict, so people have peace of mind," Xu says. "This is very different from the design of the agentic browsers out there where the result is simply unpredictable, not to mention the associated latency and overhead. I believe consistency is a necessity for us to push an AI browser to a mass population. We have some flashy capabilities too, but our primary goal is that people can just use it in their daily lives without ever having to worry about all the vulnerabilities that most agentic browsers introduce. Since we're calm, reliable and safe by design, we believe we’ll win the hearts of a mass audience."For anyone watching the rapid shift toward AI-powered browsing, Neo shows how Norton is fusing assistance, security, and zero-prompt design into a single experience. See it in action at neobrowser.ai.Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

V3.2 drops, newcomer beats giants, AI training underground, job study results, and more...

#ai

Amazon Web Services (AWS) is leaning into the growing trend toward custom models with a new service that it says will let enterprises bring more personalization and internal knowledge. The move comes alongside the release of AWS's new models as part of its Nova family, which expands the capabilities of its reasoning models.Nova 2 Lite, Nova 2 Pro, Nova 2 Sonic and Nova 2 Omni update the first Nova models AWS announced last year.Nova 2 Lite is a fast, cost-effective reasoning model optimized for everyday tasks that can process text, images and videos to generate text. Nova 2 Pro, which AWS said is its most intelligent reasoning model, can handle complex tasks such as coding agents, long-range planning and problem-solving. It can act as a “teacher” model for distillation projects. Nova 2 Sonic is a speech-to-speech model, while Nova 2 Omni enables organizations to generate both text and images from text, image and video inputs. Nova Act, AWS's browser agent — announced as an experimental development kit in April — is also powered by the Nova 2 models and now available to customers. However, it is the custom model service, Nova Forge, that AWS is most excited about. The service gives customers the ability to introduce proprietary data to a pre-trained model without fear that the model will forget its previous training. Nova Forge allows enterprises to create custom, optimized versions of Nova models, which it calls “Novellas,” and bring them directly to its Amazon Bedrock platform. Custom model creation Enterprises are increasingly turning to model distillation or custom models, especially with many industries choosing to create foundation models with domain-specific knowledge. But these can often be out of reach for many companies, as not everyone can afford several Nvidia GPU H100s to build models from scratch. As a result, they turn to heavily fine-tuned open-source off-the-shelf models. “You just don't have a great way to get a frontier model that deeply understands your data and your domain," AWS CEO Matt Garman said during his keynote speech at AWS’s annual re: Invent conference.  "But what if it was possible? What if you could integrate your data at the right time during the training of a frontier model, then create a proprietary model that was just for you?” Nova Forge employs what AWS calls “open training,” which allows developers to blend their proprietary data with an Amazon-curated dataset at every step of model development, with checkpoints during training. AWS said this means models will not regress on foundational capabilities, such as instruction following, while learning company-specific knowledge and instructions. Each “Novella” could be a custom version of Nova 2 Lite, with Nova’s full knowledge and reasoning power, but with domain-specificity. Right now, enterprises can only make Novellas from Nova 2 Lite, but many will expand to other Nova 2 models soon. Nova Forge also offers enterprises “reinforcement learning gyms." This allows them to train AI systems through their own environments with simulated scenarios to create smaller, faster models and access responsible AI toolkits. Once companies create their Novellas, they can bring them to Bedrock to build more applications and agents. One customer currently using Nova Forge is Reddit, which integrated its own data and community-specific knowledge into a model to build a moderation program. Nova Forge only works with Nova models, and AWS does not plan to bring in third-party open-source models hosted on Bedrock (for now).Nova 2 models in detailAWS said tens of thousands of companies now use its Nova models and the company expects the Nova 2 models to see the same adoption. “Nova 2 Lite delivers incredible price performance for many workloads that we actually see our customers wanting to deliver in production,” Garman said. “We think Nova 2 Lite will be the workhorse for many companies, while Pro will be for more complex tasks and for when you need your agents to be great.”In a press release, AWS said evaluations showed Nova 2 Lite performed “equal or better on 13 out of 15 benchmarks compared to Claude Haiku 4.5, equal or better on 11 out of 17 benchmarks compared to GPT-5 Mini and equal or better on 14 out of 18 benchmarks compared to Gemini Flash 2.5.”Users can adjust how much Nova 2 Lite shows its step-by-step thinking to balance costs with depth. Nova Pro 2 also performed well in benchmark testing compared to Claude Sonnet 4.5, GPT-5.1 and Gemini 2.5 Pro. This model works best for multi-document analysis, video reasoning, advanced math and agentic engineering tasks. AWS said in its press release that both Nova 2 Lite and Pro “have built-in grounding and code execution capabilities.”Nova 2 Sonic, the speech-to-speech model, generates human-like conversations and now supports multiple languages. The updated model has a 1-million-token context window, with more expressive voices and higher accuracy. The company said Sonic can even switch topics mid-conversation.Nova 2 Omni handles “up to 750,000 words, hours of audio, long videos and hundred-page documents, simultaneously analyzing entire product catalogs, testimonials, brand guidelines and video libraries at once.” “While there are no comparable models in the industry to Nova 2 Omni, it demonstrates strengths in public benchmarks of multimodal reasoning on documents, images, videos and audio, and can generate high-quality images similar to other leading image-generation models,” AWS said in its release.

#ai

For much of 2025, the frontier of open-weight language models has been defined not in Silicon Valley or New York City, but in Beijing and Hangzhou.Chinese research labs including Alibaba's Qwen, DeepSeek, Moonshot and Baidu have rapidly set the pace in developing large-scale, open Mixture-of-Experts (MoE) models — often with permissive licenses and leading benchmark performance. While OpenAI fielded its own open source, general purpose LLM this summer as well — gpt-oss-20B and 120B — the uptake has been slowed by so many equally or better performing alternatives. Now, one small U.S. company is pushing back.Today, Arcee AI announced the release of Trinity Mini and Trinity Nano Preview, the first two models in its new “Trinity” family—an open-weight MoE model suite fully trained in the United States. Users can try the former directly for themselves in a chatbot format on Acree's new website, chat.arcee.ai, and developers can download the code for both models on Hugging Face and run it themselves, as well as modify them/fine-tune to their liking — all for free under an enterprise-friendly Apache 2.0 license. While small compared to the largest frontier models, these releases represent a rare attempt by a U.S. startup to build end-to-end open-weight models at scale—trained from scratch, on American infrastructure, using a U.S.-curated dataset pipeline."I'm experiencing a combination of extreme pride in my team and crippling exhaustion, so I'm struggling to put into words just how excited I am to have these models out," wrote Arcee Chief Technology Officer (CTO) Lucas Atkins in a post on the social network X (formerly Twitter). "Especially Mini."A third model, Trinity Large, is already in training: a 420B parameter model with 13B active parameters per token, scheduled to launch in January 2026.“We want to add something that has been missing in that picture,” Atkins wrote in the Trinity launch manifesto published on Arcee's website. “A serious open weight model family trained end to end in America… that businesses and developers can actually own.”From Small Models to Scaled AmbitionThe Trinity project marks a turning point for Arcee AI, which until now has been known for its compact, enterprise-focused models. The company has raised $29.5 million in funding to date, including a $24 million Series A in 2024 led by Emergence Capital, and its previous releases include AFM-4.5B, a compact instruct-tuned model released in mid-2025, and SuperNova, an earlier 70B-parameter instruction-following model designed for in-VPC enterprise deployment. Both were aimed at solving regulatory and cost issues plaguing proprietary LLM adoption in the enterprise.With Trinity, Arcee is aiming higher: not just instruction tuning or post-training, but full-stack pretraining of open-weight foundation models—built for long-context reasoning, synthetic data adaptation, and future integration with live retraining systems.Originally conceived as a stepping stone to Trinity Large, both Mini and Nano emerged from early experimentation with sparse modeling and quickly became production targets themselves.Technical HighlightsTrinity Mini is a 26B parameter model with 3B active per token, designed for high-throughput reasoning, function calling, and tool use. Trinity Nano Preview is a 6B parameter model with roughly 800M active non-embedding parameters—a more experimental, chat-focused model with a stronger personality, but lower reasoning robustness. Both models use Arcee’s new Attention-First Mixture-of-Experts (AFMoE) architecture, a custom MoE design blending global sparsity, local/global attention, and gated attention techniques.Inspired by recent advances from DeepSeek and Qwen, AFMoE departs from traditional MoE by tightly integrating sparse expert routing with an enhanced attention stack — including grouped-query attention, gated attention, and a local/global pattern that improves long-context reasoning. Think of a typical MoE model like a call center with 128 specialized agents (called “experts”) — but only a few are consulted for each call, depending on the question. This saves time and energy, since not every expert needs to weigh in.What makes AFMoE different is how it decides which agents to call and how it blends their answers. Most MoE models use a standard approach that picks experts based on a simple ranking. AFMoE, by contrast, uses a smoother method (called sigmoid routing) that’s more like adjusting a volume dial than flipping a switch — letting the model blend multiple perspectives more gracefully.The “attention-first” part means the model focuses heavily on how it pays attention to different parts of the conversation. Imagine reading a novel and remembering some parts more clearly than others based on importance, recency, or emotional impact — that’s attention. AFMoE improves this by combining local attention (focusing on what was just said) with global attention (remembering key points from earlier), using a rhythm that keeps things balanced.Finally, AFMoE introduces something called gated attention, which acts like a volume control on each attention output — helping the model emphasize or dampen different pieces of information as needed, like adjusting how much you care about each voice in a group discussion.All of this is designed to make the model more stable during training and more efficient at scale — so it can understand longer conversations, reason more clearly, and run faster without needing massive computing resources.Unlike many existing MoE implementations, AFMoE emphasizes stability at depth and training efficiency, using techniques like sigmoid-based routing without auxiliary loss, and depth-scaled normalization to support scaling without divergence.Model CapabilitiesTrinity Mini adopts an MoE architecture with 128 experts, 8 active per token, and 1 always-on shared expert. Context windows reach up to 131,072 tokens, depending on provider. Benchmarks show Trinity Mini performing competitively with larger models across reasoning tasks, including outperforming gpt-oss on the SimpleQA benchmark (tests factual recall and whether the model admits uncertainty), MMLU (Zero shot, measuring broad academic knowledge and reasoning across many subjects without examples), and BFCL V3 (evaluates multi-step function calling and real-world tool use):MMLU (zero-shot): 84.95Math-500: 92.10GPQA-Diamond: 58.55BFCL V3: 59.67Latency and throughput numbers across providers like Together and Clarifai show 200+ tokens per second throughput with sub-three-second E2E latency—making Trinity Mini viable for interactive applications and agent pipelines.Trinity Nano, while smaller and not as stable on edge cases, demonstrates sparse MoE architecture viability at under 1B active parameters per token. Access, Pricing, and Ecosystem IntegrationBoth Trinity models are released under the permissive, enterprise-friendly, Apache 2.0 license, allowing unrestricted commercial and research use. Trinity Mini is available via:Hugging FaceOpenRouterchat.arcee.aiAPI pricing for Trinity Mini via OpenRouter:$0.045 per million input tokens$0.15 per million output tokensA free tier is available for a limited time on OpenRouterThe model is already integrated into apps including Benchable.ai, Open WebUI, and SillyTavern. It's supported in Hugging Face Transformers, VLLM, LM Studio, and llama.cpp.Data Without Compromise: DatologyAI’s RoleCentral to Arcee’s approach is control over training data—a sharp contrast to many open models trained on web-scraped or legally ambiguous datasets. That’s where DatologyAI, a data curation startup co-founded by former Meta and DeepMind researcher Ari Morcos, plays a critical role.DatologyAI’s platform automates data filtering, deduplication, and quality enhancement across modalities, ensuring Arcee’s training corpus avoids the pitfalls of noisy, biased, or copyright-risk content. For Trinity, DatologyAI helped construct a 10 trillion token curriculum organized into three phases: 7T general data, 1.8T high-quality text, and 1.2T STEM-heavy material, including math and code.This is the same partnership that powered Arcee’s AFM-4.5B—but scaled significantly in both size and complexity. According to Arcee, it was Datology’s filtering and data-ranking tools that allowed Trinity to scale cleanly while improving performance on tasks like mathematics, QA, and agent tool use.Datology’s contribution also extends into synthetic data generation. For Trinity Large, the company has produced over 10 trillion synthetic tokens—paired with 10T curated web tokens—to form a 20T-token training corpus for the full-scale model now in progress.Building the Infrastructure to Compete: Prime IntellectArcee’s ability to execute full-scale training in the U.S. is also thanks to its infrastructure partner, Prime Intellect. The startup, founded in early 2024, began with a mission to democratize access to AI compute by building a decentralized GPU marketplace and training stack.While Prime Intellect made headlines with its distributed training of INTELLECT-1—a 10B parameter model trained across contributors in five countries—its more recent work, including the 106B INTELLECT-3, acknowledges the tradeoffs of scale: distributed training works, but for 100B+ models, centralized infrastructure is still more efficient.For Trinity Mini and Nano, Prime Intellect supplied the orchestration stack, modified TorchTitan runtime, and physical compute environment: 512 H200 GPUs in a custom bf16 pipeline, running high-efficiency HSDP parallelism. It is also hosting the 2048 B300 GPU cluster used to train Trinity Large.The collaboration shows the difference between branding and execution. While Prime Intellect’s long-term goal remains decentralized compute, its short-term value for Arcee lies in efficient, transparent training infrastructure—infrastructure that remains under U.S. jurisdiction, with known provenance and security controls.A Strategic Bet on Model SovereigntyArcee's push into full pretraining reflects a broader thesis: that the future of enterprise AI will depend on owning the training loop—not just fine-tuning. As systems evolve to adapt from live usage and interact with tools autonomously, compliance and control over training objectives will matter as much as performance.“As applications get more ambitious, the boundary between ‘model’ and ‘product’ keeps moving,” Atkins noted in Arcee's Trinity manifesto. “To build that kind of software you need to control the weights and the training pipeline, not only the instruction layer.”This framing sets Trinity apart from other open-weight efforts. Rather than patching someone else’s base model, Arcee has built its own—from data to deployment, infrastructure to optimizer—alongside partners who share that vision of openness and sovereignty.Looking Ahead: Trinity LargeTraining is currently underway for Trinity Large, Arcee’s 420B parameter MoE model, using the same afmoe architecture scaled to a larger expert set. The dataset includes 20T tokens, split evenly between synthetic data from DatologyAI and curated wb data.The model is expected to launch next month in January 2026, with a full technical report to follow shortly thereafter.If successful, it would make Trinity Large one of the only fully open-weight, U.S.-trained frontier-scale models—positioning Arcee as a serious player in the open ecosystem at a time when most American LLM efforts are either closed or based on non-U.S. foundations.A recommitment to U.S. open sourceIn a landscape where the most ambitious open-weight models are increasingly shaped by Chinese research labs, Arcee’s Trinity launch signals a rare shift in direction: an attempt to reclaim ground for transparent, U.S.-controlled model development. Backed by specialized partners in data and infrastructure, and built from scratch for long-term adaptability, Trinity is a bold statement about the future of U.S. AI development, showing that small, lesser-known companies can still push the boundaries and innovate in an open fashion even as the industry is increasingly productized and commodtized. What remains to be seen is whether Trinity Large can match the capabilities of its better-funded peers. But with Mini and Nano already in use, and a strong architectural foundation in place, Arcee may already be proving its central thesis: that model sovereignty, not just model size, will define the next era of AI.

#machine learning #ai coding assistants #data science #github copilot #learning rate

Christmas connections, Copilot's costs, careful (no-)choices
The post The Machine Learning Lessons I’ve Learned This Month appeared first on Towards Data Science.

#search #gemini models #ai

We're bringing our most intelligent model yet, Gemini 3 Pro, to Google Search in more countries around the world.

#machine learning #algorithms #data science #deep dives #excel #k nearest neighbors

This first day of the Advent Calendar introduces the k-NN regressor, the simplest distance-based model. Using Excel, we explore how predictions rely entirely on the closest observations, why feature scaling matters, and how heterogeneous variables can make distances meaningless. Through examples with continuous and categorical features, including the California Housing and Diamonds datasets, we see the strengths and limitations of k-NN, and why defining the right distance is essential to reflect real-world structure.
The post The Machine Learning “Advent Calendar” Day 1: k-NN Regressor in Excel appeared first on Towards Data Science.

#ai

Chinese artificial intelligence startup DeepSeek released two powerful new AI models on Sunday that the company claims match or exceed the capabilities of OpenAI's GPT-5 and Google's Gemini-3.0-Pro — a development that could reshape the competitive landscape between American tech giants and their Chinese challengers.The Hangzhou-based company launched DeepSeek-V3.2, designed as an everyday reasoning assistant, alongside DeepSeek-V3.2-Speciale, a high-powered variant that achieved gold-medal performance in four elite international competitions: the 2025 International Mathematical Olympiad, the International Olympiad in Informatics, the ICPC World Finals, and the China Mathematical Olympiad.The release carries profound implications for American technology leadership. DeepSeek has once again demonstrated that it can produce frontier AI systems despite U.S. export controls that restrict China's access to advanced Nvidia chips — and it has done so while making its models freely available under an open-source MIT license."People thought DeepSeek gave a one-time breakthrough but we came back much bigger," wrote Chen Fang, who identified himself as a contributor to the project, on X (formerly Twitter). The release drew swift reactions online, with one user declaring: "Rest in peace, ChatGPT."How DeepSeek's sparse attention breakthrough slashes computing costsAt the heart of the new release lies DeepSeek Sparse Attention, or DSA — a novel architectural innovation that dramatically reduces the computational burden of running AI models on long documents and complex tasks.Traditional AI attention mechanisms, the core technology allowing language models to understand context, scale poorly as input length increases. Processing a document twice as long typically requires four times the computation. DeepSeek's approach breaks this constraint using what the company calls a "lightning indexer" that identifies only the most relevant portions of context for each query, ignoring the rest.According to DeepSeek's technical report, DSA reduces inference costs by roughly half compared to previous models when processing long sequences. The architecture "substantially reduces computational complexity while preserving model performance," the report states.Processing 128,000 tokens — roughly equivalent to a 300-page book — now costs approximately $0.70 per million tokens for decoding, compared to $2.40 for the previous V3.1-Terminus model. That represents a 70% reduction in inference costs.The 685-billion-parameter models support context windows of 128,000 tokens, making them suitable for analyzing lengthy documents, codebases, and research papers. DeepSeek's technical report notes that independent evaluations on long-context benchmarks show V3.2 performing on par with or better than its predecessor "despite incorporating a sparse attention mechanism."The benchmark results that put DeepSeek in the same league as GPT-5DeepSeek's claims of parity with America's leading AI systems rest on extensive testing across mathematics, coding, and reasoning tasks — and the numbers are striking.On AIME 2025, a prestigious American mathematics competition, DeepSeek-V3.2-Speciale achieved a 96.0% pass rate, compared to 94.6% for GPT-5-High and 95.0% for Gemini-3.0-Pro. On the Harvard-MIT Mathematics Tournament, the Speciale variant scored 99.2%, surpassing Gemini's 97.5%.The standard V3.2 model, optimized for everyday use, scored 93.1% on AIME and 92.5% on HMMT — marginally below frontier models but achieved with substantially fewer computational resources.Most striking are the competition results. DeepSeek-V3.2-Speciale scored 35 out of 42 points on the 2025 International Mathematical Olympiad, earning gold-medal status. At the International Olympiad in Informatics, it scored 492 out of 600 points — also gold, ranking 10th overall. The model solved 10 of 12 problems at the ICPC World Finals, placing second.These results came without internet access or tools during testing. DeepSeek's report states that "testing strictly adheres to the contest's time and attempt limits."On coding benchmarks, DeepSeek-V3.2 resolved 73.1% of real-world software bugs on SWE-Verified, competitive with GPT-5-High at 74.9%. On Terminal Bench 2.0, measuring complex coding workflows, DeepSeek scored 46.4%—well above GPT-5-High's 35.2%.The company acknowledges limitations. "Token efficiency remains a challenge," the technical report states, noting that DeepSeek "typically requires longer generation trajectories" to match Gemini-3.0-Pro's output quality.Why teaching AI to think while using tools changes everythingBeyond raw reasoning, DeepSeek-V3.2 introduces "thinking in tool-use" — the ability to reason through problems while simultaneously executing code, searching the web, and manipulating files.Previous AI models faced a frustrating limitation: each time they called an external tool, they lost their train of thought and had to restart reasoning from scratch. DeepSeek's architecture preserves the reasoning trace across multiple tool calls, enabling fluid multi-step problem solving.To train this capability, the company built a massive synthetic data pipeline generating over 1,800 distinct task environments and 85,000 complex instructions. These included challenges like multi-day trip planning with budget constraints, software bug fixes across eight programming languages, and web-based research requiring dozens of searches.The technical report describes one example: planning a three-day trip from Hangzhou with constraints on hotel prices, restaurant ratings, and attraction costs that vary based on accommodation choices. Such tasks are "hard to solve but easy to verify," making them ideal for training AI agents.DeepSeek employed real-world tools during training — actual web search APIs, coding environments, and Jupyter notebooks — while generating synthetic prompts to ensure diversity. The result is a model that generalizes to unseen tools and environments, a critical capability for real-world deployment.DeepSeek's open-source gambit could upend the AI industry's business modelUnlike OpenAI and Anthropic, which guard their most powerful models as proprietary assets, DeepSeek has released both V3.2 and V3.2-Speciale under the MIT license — one of the most permissive open-source frameworks available.Any developer, researcher, or company can download, modify, and deploy the 685-billion-parameter models without restriction. Full model weights, training code, and documentation are available on Hugging Face, the leading platform for AI model sharing.The strategic implications are significant. By making frontier-capable models freely available, DeepSeek undermines competitors charging premium API prices. The Hugging Face model card notes that DeepSeek has provided Python scripts and test cases "demonstrating how to encode messages in OpenAI-compatible format" — making migration from competing services straightforward.For enterprise customers, the value proposition is compelling: frontier performance at dramatically lower cost, with deployment flexibility. But data residency concerns and regulatory uncertainty may limit adoption in sensitive applications — particularly given DeepSeek's Chinese origins.Regulatory walls are rising against DeepSeek in Europe and AmericaDeepSeek's global expansion faces mounting resistance. In June, Berlin's data protection commissioner Meike Kamp declared that DeepSeek's transfer of German user data to China is "unlawful" under EU rules, asking Apple and Google to consider blocking the app.The German authority expressed concern that "Chinese authorities have extensive access rights to personal data within the sphere of influence of Chinese companies." Italy ordered DeepSeek to block its app in February. U.S. lawmakers have moved to ban the service from government devices, citing national security concerns.Questions also persist about U.S. export controls designed to limit China's AI capabilities. In August, DeepSeek hinted that China would soon have "next generation" domestically built chips to support its models. The company indicated its systems work with Chinese-made chips from Huawei and Cambricon without additional setup.DeepSeek's original V3 model was reportedly trained on roughly 2,000 older Nvidia H800 chips — hardware since restricted for China export. The company has not disclosed what powered V3.2 training, but its continued advancement suggests export controls alone cannot halt Chinese AI progress.What DeepSeek's release means for the future of AI competitionThe release arrives at a pivotal moment. After years of massive investment, some analysts question whether an AI bubble is forming. DeepSeek's ability to match American frontier models at a fraction of the cost challenges assumptions that AI leadership requires enormous capital expenditure.The company's technical report reveals that post-training investment now exceeds 10% of pre-training costs — a substantial allocation credited for reasoning improvements. But DeepSeek acknowledges gaps: "The breadth of world knowledge in DeepSeek-V3.2 still lags behind leading proprietary models," the report states. The company plans to address this by scaling pre-training compute.DeepSeek-V3.2-Speciale remains available through a temporary API until December 15, when its capabilities will merge into the standard release. The Speciale variant is designed exclusively for deep reasoning and does not support tool calling — a limitation the standard model addresses.For now, the AI race between the United States and China has entered a new phase. DeepSeek's release demonstrates that open-source models can achieve frontier performance, that efficiency innovations can slash costs dramatically, and that the most powerful AI systems may soon be freely available to anyone with an internet connection.As one commenter on X observed: "Deepseek just casually breaking those historic benchmarks set by Gemini is bonkers."The question is no longer whether Chinese AI can compete with Silicon Valley. It's whether American companies can maintain their lead when their Chinese rival gives comparable technology away for free.

#artificial intelligence #ai safety #machine learning #openai #security #browsers

How Atlas and most current AI-powered browsers fail on three aspects: privacy, security, and censorship
The post The Problem with AI Browsers: Security Flaws and the End of Privacy appeared first on Towards Data Science.

#ai

When Liquid AI, a startup founded by MIT computer scientists back in 2023, introduced its Liquid Foundation Models series 2 (LFM2) in July 2025, the pitch was straightforward: deliver the fastest on-device foundation models on the market using the new "liquid" architecture, with training and inference efficiency that made small models a serious alternative to cloud-only large language models (LLMs) such as OpenAI's GPT series and Google's Gemini. The initial release shipped dense checkpoints at 350M, 700M, and 1.2B parameters, a hybrid architecture heavily weighted toward gated short convolutions, and benchmark numbers that placed LFM2 ahead of similarly sized competitors like Qwen3, Llama 3.2, and Gemma 3 on both quality and CPU throughput. The message to enterprises was clear: real-time, privacy-preserving AI on phones, laptops, and vehicles no longer required sacrificing capability for latency.In the months since that launch, Liquid has expanded LFM2 into a broader product line — adding task-and-domain-specialized variants, a small video ingestion and analysis model, and an edge-focused deployment stack called LEAP — and positioned the models as the control layer for on-device and on-prem agentic systems. Now, with the publication of the detailed, 51-page LFM2 technical report on arXiv, the company is going a step further: making public the architecture search process, training data mixture, distillation objective, curriculum strategy, and post-training pipeline behind those models. And unlike earlier open models, LFM2 is built around a repeatable recipe: a hardware-in-the-loop search process, a training curriculum that compensates for smaller parameter budgets, and a post-training pipeline tuned for instruction following and tool use. Rather than just offering weights and an API, Liquid is effectively publishing a detailed blueprint that other organizations can use as a reference for training their own small, efficient models from scratch, tuned to their own hardware and deployment constraints.A model family designed around real constraints, not GPU labsThe technical report begins with a premise enterprises are intimately familiar with: real AI systems hit limits long before benchmarks do. Latency budgets, peak memory ceilings, and thermal throttling define what can actually run in production—especially on laptops, tablets, commodity servers, and mobile devices.To address this, Liquid AI performed architecture search directly on target hardware, including Snapdragon mobile SoCs and Ryzen laptop CPUs. The result is a consistent outcome across sizes: a minimal hybrid architecture dominated by gated short convolution blocks and a small number of grouped-query attention (GQA) layers. This design was repeatedly selected over more exotic linear-attention and SSM hybrids because it delivered a better quality-latency-memory Pareto profile under real device conditions.This matters for enterprise teams in three ways:Predictability. The architecture is simple, parameter-efficient, and stable across model sizes from 350M to 2.6B.Operational portability. Dense and MoE variants share the same structural backbone, simplifying deployment across mixed hardware fleets.On-device feasibility. Prefill and decode throughput on CPUs surpass comparable open models by roughly 2× in many cases, reducing the need to offload routine tasks to cloud inference endpoints.Instead of optimizing for academic novelty, the report reads as a systematic attempt to design models enterprises can actually ship.This is notable and more practical for enterprises in a field where many open models quietly assume access to multi-H100 clusters during inference.A training pipeline tuned for enterprise-relevant behaviorLFM2 adopts a training approach that compensates for the smaller scale of its models with structure rather than brute force. Key elements include:10–12T token pre-training and an additional 32K-context mid-training phase, which extends the model’s useful context window without exploding compute costs.A decoupled Top-K knowledge distillation objective that sidesteps the instability of standard KL distillation when teachers provide only partial logits.A three-stage post-training sequence—SFT, length-normalized preference alignment, and model merging—designed to produce more reliable instruction following and tool-use behavior.For enterprise AI developers, the significance is that LFM2 models behave less like “tiny LLMs” and more like practical agents able to follow structured formats, adhere to JSON schemas, and manage multi-turn chat flows. Many open models at similar sizes fail not due to lack of reasoning ability, but due to brittle adherence to instruction templates. The LFM2 post-training recipe directly targets these rough edges.In other words: Liquid AI optimized small models for operational reliability, not just scoreboards.Multimodality designed for device constraints, not lab demosThe LFM2-VL and LFM2-Audio variants reflect another shift: multimodality built around token efficiency.Rather than embedding a massive vision transformer directly into an LLM, LFM2-VL attaches a SigLIP2 encoder through a connector that aggressively reduces visual token count via PixelUnshuffle. High-resolution inputs automatically trigger dynamic tiling, keeping token budgets controllable even on mobile hardware. LFM2-Audio uses a bifurcated audio path—one for embeddings, one for generation—supporting real-time transcription or speech-to-speech on modest CPUs.For enterprise platform architects, this design points toward a practical future where:document understanding happens directly on endpoints such as field devices;audio transcription and speech agents run locally for privacy compliance;multimodal agents operate within fixed latency envelopes without streaming data off-device.The through-line is the same: multimodal capability without requiring a GPU farm.Retrieval models built for agent systems, not legacy searchLFM2-ColBERT extends late-interaction retrieval into a footprint small enough for enterprise deployments that need multilingual RAG without the overhead of specialized vector DB accelerators.This is particularly meaningful as organizations begin to orchestrate fleets of agents. Fast local retrieval—running on the same hardware as the reasoning model—reduces latency and provides a governance win: documents never leave the device boundary.Taken together, the VL, Audio, and ColBERT variants show LFM2 as a modular system, not a single model drop.The emerging blueprint for hybrid enterprise AI architecturesAcross all variants, the LFM2 report implicitly sketches what tomorrow’s enterprise AI stack will look like: hybrid local-cloud orchestration, where small, fast models operating on devices handle time-critical perception, formatting, tool invocation, and judgment tasks, while larger models in the cloud offer heavyweight reasoning when needed.Several trends converge here:Cost control. Running routine inference locally avoids unpredictable cloud billing.Latency determinism. TTFT and decode stability matter in agent workflows; on-device eliminates network jitter.Governance and compliance. Local execution simplifies PII handling, data residency, and auditability.Resilience. Agentic systems degrade gracefully if the cloud path becomes unavailable.Enterprises adopting these architectures will likely treat small on-device models as the “control plane” of agentic workflows, with large cloud models serving as on-demand accelerators.LFM2 is one of the clearest open-source foundations for that control layer to date.The strategic takeaway: on-device AI is now a design choice, not a compromiseFor years, organizations building AI features have accepted that “real AI” requires cloud inference. LFM2 challenges that assumption. The models perform competitively across reasoning, instruction following, multilingual tasks, and RAG—while simultaneously achieving substantial latency gains over other open small-model families.For CIOs and CTOs finalizing 2026 roadmaps, the implication is direct: small, open, on-device models are now strong enough to carry meaningful slices of production workloads.LFM2 will not replace frontier cloud models for frontier-scale reasoning. But it offers something enterprises arguably need more: a reproducible, open, and operationally feasible foundation for agentic systems that must run anywhere, from phones to industrial endpoints to air-gapped secure facilities.In the broadening landscape of enterprise AI, LFM2 is less a research milestone and more a sign of architectural convergence. The future is not cloud or edge—it’s both, operating in concert. And releases like LFM2 provide the building blocks for organizations prepared to build that hybrid future intentionally rather than accidentally.

Build a lightweight Python DSL to define and check data quality rules in a clear, expressive way. Turn complex validation logic into simple, reusable configurations that anyone on your data team can understand.

#artificial intelligence #app #the algorithm #the state of ai #why it matters

Welcome back to The State of AI, a new collaboration between the Financial Times and MIT Technology Review. Every Monday for the next two weeks, writers from both publications will debate one aspect of the generative AI revolution reshaping global power. You can read the rest of the series here. This week, Richard Waters, FT…

#classes and programs #international initiatives #collaboration #mechanical engineering #misti #mit sea grant #artificial intelligence #machine learning #software #data #robotics #environment #oceanography and ocean engineering #water #global #europe #food #industry #students #undergraduate #undergraduate research opportunities program (urop) #school of engineering #mit schwarzman college of computing #school of humanities arts and social sciences #center for international studies #uav #sensors #internships

AquaCulture Shock program, in collaboration with MIT-Scandinavia MISTI, offers international internships for AI and autonomy in aquaculture

#special events and guest speakers #energy #batteries #transportation #energy storage #electric vehicles #cleaner industry #supply chains #artificial intelligence #industry #automobiles #mit energy initiative

At MITEI’s Fall Colloquium, General Motors’ battery development expert emphasized how affordability, accessibility, and commercialization can position the US as a leader in battery tech.

#artificial intelligence #author spotlights #data science #machine learning #python

Vyacheslav Efimov on AI hackathons, data science roadmaps, and how AI meaningfully changed day-to-day ML Engineer work
The post Learning, Hacking, and Shipping ML appeared first on Towards Data Science.

It’s not about clever wording anymore. It’s about designing environments where AI can think with depth, consistency, and purpose.

« 123456...185»
×