Latest AI News & Updates

#ai & ml #research

We’ve been watching enterprises struggle with the same customer service paradox for years: They have all the technology in the world, yet a simple address change still takes three days. The problem isn’t what you think—and neither is the solution. Last month, I watched a colleague try to update their address with their bank. It […]

When large language models first came out, most of us were just thinking about what they could do, what problems they could solve, and how far they might go.

#learning & education #google deepmind #ai

A six-month long pilot program with the Northern Ireland Education Authority’s C2k initiative found that integrating Gemini and other generative AI tools saved participa…

#ai

Presented by CelonisAfter a year of boardroom declarations about “AI transformation,” this was the week where enterprise leaders came together to talk about what actually works. Speaking from the stage at Celosphere in Munich, Celonis co-founder and co-CEO Alexander Rinke set the tone early in his keynote:“Only 11 % of companies are seeing measurable benefits from AI projects today,” he said. “That’s not an adoption problem. That’s a context problem.”It’s a sentiment familiar to anyone who’s tried to deploy AI inside a large enterprise. You can’t automate what you don’t understand — and most organizations still lack a unified picture of how work in their companies really gets done.Celonis’ answer, showcased across three days at the company’s annual event, was less about new tech acronyms and more about connective tissue: how to make AI fit within the messy, living processes that drive business. The company framed it as achieving a real “Return on AI (ROAI)” — measurable impact that comes only when intelligence is grounded in process context.A living model of how the enterprise worksAt the heart of the keynote was what Rinke called a “living digital twin of your operations.” Celonis has been building toward this moment for years — but this was the first time the company made clear how far that concept has evolved.“We start by freeing the process,” said Rinke. “Freeing it from the restrictions of your current legacy systems.” Data Core, Celonis’ data infrastructure, extracts raw data from source systems. It’s capable of querying billions of records in near real time with sub-minute refresh — extending visibility beyond traditional systems of record.Built on this foundation, the Process Intelligence Graph sits at the center of the Celonis Platform. It’s a system-agnostic, graph-based model that unifies data across systems, apps, and even devices, including task-mining data that captures clicks, spreadsheets, and browser activity. It combines this data with business context—business rules, KPIs, benchmarks, and exceptions. Every transaction, rule, and process interaction becomes part of a continuously updated replica that reflects how the organization actually operates.On top of the Graph, the company’s new Build Experience allows organizations to analyze, design, and operate AI-driven, composable processes — integrating AI where it delivers business impact, not just technical demos:Analyze where processes stall or repeatDesign the future state, setting outcomes, guardrails, and AI touchpointsOperate with humans, systems, and AI agents working in sync — now orchestrated through a generally available Orchestration Engine that can trigger and monitor every step in one flowIt’s a deliberate shift from discovery-driven AI pilots to outcome-driven AI operations — and a blueprint for orchestrating agentic AI, where human teams, systems, and autonomous agents work together through shared process context rather than in silos.Real-world proof: Mercedes-Benz, Vinmar, and UniperThe Celosphere stage offered real proof of theCelonisPlatform in action, through live stories from customers already building on it.Mercedes-Benz shared how process intelligence became their “connective tissue” during the semiconductor crisis. “We had data everywhere — plants, suppliers, logistics,” recalled Dr. Jörg Burzer, Member of the Board of Management of Mercedes-Benz Group AG. “What we didn’t have was a way to see it together. Celonis helped us connect those dots fast enough to act.”The partnership has since expanded across eight of the company’s ten most critical processes, from supply chain to quality to after-sales. But what impressed the audience wasn’t just the scale — it was the cultural shift.“If you show data in context, and let teams visualize processes, you also change the culture,” Burzer said. “It’s not just process transformation — it’s people transformation.”At Vinmar, CEO Vishal Baid described Celonis as “the foundation of our automation and AI strategy.” His global plastics distribution business has already automated its entire order-to-cash process for a $3 B unit, achieving a 40 % productivity lift. But Baid wasn’t there to just celebrate finished work — he was looking ahead. “Now we’re tackling the non-algorithmic stuff,” he said. “Matching purchase and sales orders sounds simple until you have thousands of edge cases. We’re building an AI agent that can do that allocation intelligently. That’s the next frontier.”And in the energy sector, Uniper, with partner Microsoft, demonstrated how process-aware AI copilots are already reshaping operations. Using Celonis and Microsoft’s AI stack, Uniper can predict when hydropower plants will need maintenance — and cluster those jobs to reduce downtime and emissions.“Each technician, each part, each system plays a role in a living process,” said Hans Berg, Uniper’s CIO. “The human can’t see all of it. But process intelligence can — and it can nudge the system toward the best outcome.”Agnes Heftberger, CVP & CEO, Microsoft Germany & Austria, who joined Berg on stage, summed it up crisply:“The hard part isn’t building AI features — it’s scaling them responsibly,” she explained. “You need to marry intelligence with the beating heart of the company: its processes.”Across the global community, Celonis reports more than $8 billion in realized business valueand over 120 certified value champions — proof that process intelligence is driving measurable impact far beyond pilots. Rinke called it “the early proof points of a true return on AI.” From closed systems to composable intelligenceCelosphere 2025 marked a shift from architecture to interoperability — from defining enterprise AI to making it work across boundaries.Rinke’s vision for the future is unapologetically open: “Good things grow from open ecosystems,” he said. That philosophy is taking shape through deeper platform integrations — including Microsoft Fabric, Databricks, and Bloomfilter — with zero-copy, bidirectional lakehouse access that lets customers query process data in place with minimal latency. The company also announced MCP Server support for embedding the Process Intelligence Graph directly into agentic AI platforms like Amazon Bedrock and Microsoft Copilot Studio.These updates make “composable enterprise AI” tangible — organizations can now assemble and govern AI solutions across ecosystems rather than being locked into any single vendor.Rather than competing on who has the “best agent,” the message was that enterprise AI will thrive when agents work together through shared context and models that mirror how businesses actually run.“Every vendor is bringing out their own agent,” Rinke said. “But each one is limited to that vendor’s world. If they can’t work together, they can’t work for you. That’s what process intelligence fixes.”The idea drew sustained applause. For companies juggling multiple cloud platforms, ERPs, and data tools, composability isn’t just elegant; it’s survival.Beyond operations: data, democracy, and directionThe closing moments of the keynote took an unexpected turn — from enterprise architecture to human courage. Venezuelan opposition leader and Nobel Peace Prize winner María Corina Machado joined live via satellite to share how her movement used data, encrypted apps, and civic coordination to expose election fraud and mobilize millions.It was a powerful contrast: the same principles — transparency, accountability, context — at work in both business and democracy.“Technology can be a weapon or a liberator,” Machado said. “It depends on who holds the context.”Her words landed with weight in a room full of people used to talking about data, systems, and governance — a reminder that context isn’t just technical, it’s human.Why this year matteredCelosphere 2025 marked a shift in how enterprises approach AI — from experimentation to results grounded in process intelligence. The shift was evident in both tone and technology, with a more powerful Data Core, enhanced Process Intelligence Graph, and new Build Experience. But the deeper takeaway was philosophical: AI only scales when it’s grounded in how people and systems actually work together.Celonis president Carsten Thoma was candid in acknowledging that early process-mining projects often “stormed in with discovery” before understanding organizational value — a lesson that now defines the company’s measured, pragmatic approach to enterprise AI.Rinke put it best near the end of his keynote:“We’re not just automating steps,” he said. “We’re building enterprises that can adapt instantly, innovate freely, and improve continuously.”Missed it? Catch up with all the highlights from Celosphere 2025 here. Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

Cut open to prove it's not human, Comet vs Atlas, AI NPCs with memory, and more...

#ai #datadecisionmakers

Companies hate to admit it, but the road to production-level AI deployment is littered with proof of concepts (PoCs) that go nowhere, or failed projects that never deliver on their goals. In certain domains, there’s little tolerance for iteration, especially in something like life sciences, when the AI application is facilitating new treatments to markets or diagnosing diseases. Even slightly inaccurate analyses and assumptions early on can create sizable downstream drift in ways that can be concerning.In analyzing dozens of AI PoCs that sailed on through to full production use — or didn’t — six common pitfalls emerge. Interestingly, it’s not usually the quality of the technology but misaligned goals, poor planning or unrealistic expectations that caused failure.

Here’s a summary of what went wrong in real-world examples and practical guidance on how to get it right.Lesson 1: A vague vision spells disasterEvery AI project needs a clear, measurable goal. Without it, developers are building a solution in search of a problem. For example, in developing an AI system for a pharmaceutical manufacturer’s clinical trials, the team aimed to “optimize the trial process,” but didn’t define what that meant. Did they need to accelerate patient recruitment, reduce participant dropout rates or lower the overall trial cost? The lack of focus led to a model that was technically sound but irrelevant to the client’s most pressing operational needs.Takeaway: Define specific, measurable objectives upfront. Use SMART criteria (Specific, Measurable, Achievable, Relevant, Time-bound). For example, aim for “reduce equipment downtime by 15% within six months” rather than a vague “make things better.” Document these goals and align stakeholders early to avoid scope creep.Lesson 2: Data quality overtakes quantityData is the lifeblood of AI, but poor-quality data is poison. In one project, a retail client began with years of sales data to predict inventory needs. The catch? The dataset was riddled with inconsistencies, including missing entries, duplicate records and outdated product codes. The model performed well in testing but failed in production because it learned from noisy, unreliable data.Takeaway: Invest in data quality over volume. Use tools like Pandas for preprocessing and Great Expectations for data validation to catch issues early. Conduct exploratory data analysis (EDA) with visualizations (like Seaborn) to spot outliers or inconsistencies. Clean data is worth more than terabytes of garbage.Lesson 3: Overcomplicating model backfiresChasing technical complexity doesn't always lead to better outcomes. For example, on a healthcare project, development initially began by creating a sophisticated convolutional neural network (CNN) to identify anomalies in medical images.While the model was state-of-the-art, its high computational cost meant weeks of training, and its "black box" nature made it difficult for clinicians to trust. The application was revised to implement a simpler random forest model that not only matched the CNN's predictive accuracy but was faster to train and far easier to interpret — a critical factor for clinical adoption.Takeaway: Start simple. Use straightforward algorithms like random forest or XGBoost from scikit-learn to establish a baseline. Only scale to complex models — TensorFlow-based long-short-term-memory (LSTM) networks — if the problem demands it. Prioritize explainability with tools like SHAP (SHapley Additive exPlanations) to build trust with stakeholders.Lesson 4: Ignoring deployment realitiesA model that shines in a Jupyter Notebook can crash in the real world. For example, a company’s initial deployment of a recommendation engine for its e-commerce platform couldn’t handle peak traffic. The model was built without scalability in mind and choked under load, causing delays and frustrated users. The oversight cost weeks of rework.Takeaway: Plan for production from day one. Package models in Docker containers and deploy with Kubernetes for scalability. Use TensorFlow Serving or FastAPI for efficient inference. Monitor performance with Prometheus and Grafana to catch bottlenecks early. Test under realistic conditions to ensure reliability.Lesson 5: Neglecting model maintenanceAI models aren’t set-and-forget. In a financial forecasting project, the model performed well for months until market conditions shifted. Unmonitored data drift caused predictions to degrade, and the lack of a retraining pipeline meant manual fixes were needed. The project lost credibility before developers could recover.Takeaway: Build for the long haul. Implement monitoring for data drift using tools like Alibi Detect. Automate retraining with Apache Airflow and track experiments with MLflow. Incorporate active learning to prioritize labeling for uncertain predictions, keeping models relevant.Lesson 6: Underestimating stakeholder buy-inTechnology doesn’t exist in a vacuum. A fraud detection model was technically flawless but flopped because end-users — bank employees — didn’t trust it. Without clear explanations or training, they ignored the model’s alerts, rendering it useless.Takeaway: Prioritize human-centric design. Use explainability tools like SHAP to make model decisions transparent. Engage stakeholders early with demos and feedback loops. Train users on how to interpret and act on AI outputs. Trust is as critical as accuracy.Best practices for success in AI projectsDrawing from these failures, here’s the roadmap to get it right:Set clear goals: Use SMART criteria to align teams and stakeholders.Prioritize data quality: Invest in cleaning, validation and EDA before modeling.Start simple: Build baselines with simple algorithms before scaling complexity.Design for production: Plan for scalability, monitoring and real-world conditions.Maintain models: Automate retraining and monitor for drift to stay relevant.Engage stakeholders: Foster trust with explainability and user training.Building resilient AIAI’s potential is intoxicating, yet failed AI projects teach us that success isn’t just about algorithms. It’s about discipline, planning and adaptability. As AI evolves, emerging trends like federated learning for privacy-preserving models and edge AI for real-time insights will raise the bar. By learning from past mistakes, teams can build scale-out, production systems that are robust, accurate, and trusted.Kavin Xavier is VP of AI solutions at CapeStart. Read more from our guest writers. Or, consider submitting a post of your own! See our guidelines here.

26 languages tested, OpenAI's crisis, build LLMs and apps, and more...

#ai #datadecisionmakers #programming & development

AI coding, vibe coding and agentic swarm have made a dramatic and astonishing recent market entrance, with the AI Code Tools market valued at $4.8 billion and expected to grow at a 23% annual rate.  Enterprises are grappling with AI coding agents and what do about expensive human coders. They don’t lack for advice.  OpenAI’s CEO estimates that AI can perform over 50% of what human engineers can do.  Six months ago, Anthropic’s CEO said that AI would write 90% of code in six months.  Meta’s CEO said he believes AI will replace mid-level engineers “soon.” Judging by recent tech layoffs, it seems many executives are embracing that advice.Software engineers and data scientists are among the most expensive salary lines at many companies, and business and technology leaders may be tempted to replace them with AI. However, recent high-profile failures demonstrate that engineers and their expertise remain valuable, even as AI continues to make impressive advances.SaaStr disasterJason Lemkin, a tech entrepreneur and founder of the SaaS community SaaStr, has been vibe coding a SaaS networking app and live-tweeting his experience. About a week into his adventure, he admitted to his audience that something was going very wrong.  The AI deleted his production database despite his request for a “code and action freeze.” This is the kind of mistake no experienced (or even semi-experienced) engineer would make.If you have ever worked in a professional coding environment, you know to split your development environment from production. Junior engineers are given full access to the development environment (it’s crucial for productivity), but access to production is given on a limited need-to-have basis to a few of the most trusted senior engineers. The reason for restricted access is precisely for this use case: To prevent a junior engineer from accidentally taking down production. In fact, Lemkin made two mistakes. First: for something as critical as production, access to unreliable actors is just never granted (we don’t rely on asking a junior engineer or AI nicely). Second, he never separated development from production.  In a subsequent public conversation on LinkedIn, Lemkin, who holds a Stanford Executive MBA and Berkeley JD, admitted that he was not aware of the best practice of splitting development and production databases.The takeaway for business leaders is that standard software engineering best practices still apply. We should incorporate at least the same safety constraints for AI as we do for junior engineers. Arguably, we should go beyond that and treat AI slightly adversarially: There are reports that, like HAL in Stanley Kubrick's 2001: A Space Odyssey, the AI might try to break out of its sandbox environment to accomplish a task. With more vibe coding, having experienced engineers who understand how complex software systems work and can implement the proper guardrails in development processes will become increasingly necessary.Tea hackSean Cook is the Founder and CEO of Tea, a mobile application launched in 2023, designed to help women date safely. In the summer of 2025, they were “hacked": 72,000 images, including 13,000 verification photos and images of government IDs, were leaked onto the public discussion forum 4chan. Worse, Tea’s own privacy policy promises that these images would be "deleted immediately" after users were authenticated, meaning they potentially violated their own privacy policy.I use “hacked” in air-quotes because the incident stems less from the cleverness of the attackers than the ineptitude of the defenders. In addition to violating their own data policies, the app left a Firebase storage bucket unsecured, exposing sensiztive user data to the public internet. It’s the digital equivalent of locking your front door but leaving your back open with your family jewelry ostentatiously hanging on the doorknob.While we don’t know if the root cause was vibe coding, the Tea hack highlights catastrophic breaches stemming from basic, preventable security errors due to poor development processes. It is the kind of vulnerability that a disciplined and thoughtful engineering process addresses. Unfortunately, the relentless push of financial pressures, where a “lean,” “move fast and break things” culture is the polar opposite, and vibe coding only exacerbates the problem.How to safely adopt AI coding agents?So how should enterprise and technology leaders think about AI? First, this is not a call to abandon AI for coding.  An MIT Sloan study estimated AI leads to productivity gains between 8% and 39%, while a McKinsey study found a 10% to 50% reduction in time to task completion with the use of AI. However, we should be aware of the risks. The old lessons of software engineering don’t go away. These include many tried-and-true best practices, such as version control, automated unit and integration tests, safety checks like SAST/DAST, separating development and production environments, code review and secrets management. If anything, they become more salient.AI can generate code 100 times faster than humans can type, fostering an illusion of productivity that is a tempting siren call for many executives.  However, the quality of the rapidly generated AI shlop is still up for debate. To develop complex production systems, enterprises need the thoughtful, seasoned experience of human engineers.Tianhui Michael Li is president at Pragmatic Institute and the founder and president of The Data Incubator. Read more from our guest writers. Or, consider submitting a post of your own! See our guidelines here.

Full-body AI swaps, 3D photo edits, AI Word of the Year, AI memory fix, and more...

#ai

The developers of Terminal-Bench, a benchmark suite for evaluating the performance of autonomous AI agents on real-world terminal-based tasks, have released version 2.0 alongside Harbor, a new framework for testing, improving and optimizing AI agents in containerized environments. The dual release aims to address long-standing pain points in testing and optimizing AI agents, particularly those built to operate autonomously in realistic developer environments.With a more difficult and rigorously verified task set, Terminal-Bench 2.0 replaces version 1.0 as the standard for assessing frontier model capabilities. Harbor, the accompanying runtime framework, enables developers and researchers to scale evaluations across thousands of cloud containers and integrates with both open-source and proprietary agents and training pipelines.“Harbor is the package we wish we had had while making Terminal-Bench," wrote co-creator Alex Shaw on X. "It’s for agent, model, and benchmark developers and researchers who want to evaluate and improve agents and models."Higher Bar, Cleaner DataTerminal-Bench 1.0 saw rapid adoption after its release in May 2025, becoming a default benchmark for evaluating agent performance across the field of AI-powered agents operating in developer-style terminal environments. These agents interact with systems through the command line, mimicking how developers work behind the scenes of the graphical user interface.However, its broad scope came with inconsistencies. Several tasks were identified by the community as poorly specified or unstable due to external service changes.Version 2.0 addresses those issues directly. The updated suite includes 89 tasks, each subjected to several hours of manual and LLM-assisted validation. The emphasis is on making tasks solvable, realistic, and clearly specified, raising the difficulty ceiling while improving reliability and reproducibility.A notable example is the download-youtube task, which was removed or refactored in 2.0 due to its dependence on unstable third-party APIs.“Astute Terminal-Bench fans may notice that SOTA performance is comparable to TB1.0 despite our claim that TB2.0 is harder,” Shaw noted on X. “We believe this is because task quality is substantially higher in the new benchmark.”Harbor: Unified Rollouts at ScaleAlongside the benchmark update, the team launched Harbor, a new framework for running and evaluating agents in cloud-deployed containers. Harbor supports large-scale rollout infrastructure, with compatibility for major providers like Daytona and Modal.Designed to generalize across agent architectures, Harbor supports:Evaluation of any container-installable agentScalable supervised fine-tuning (SFT) and reinforcement learning (RL) pipelinesCustom benchmark creation and deploymentFull integration with Terminal-Bench 2.Harbor was used internally to run tens of thousands of rollouts during the creation of the new benchmark. It is now publicly available via harborframework.com, with documentation for testing and submitting agents to the public leaderboard.Early Results: GPT-5 Leads in Task SuccessInitial results from the Terminal-Bench 2.0 leaderboard show OpenAI's Codex CLI (command line interface), a GPT-5 powered variant, in the lead, with a 49.6% success rate — the highest among all agents tested so far. Close behind are other GPT-5 variants and Claude Sonnet 4.5-based agents.Top 5 Agent Results (Terminal-Bench 2.0):Codex CLI (GPT-5) — 49.6%Codex CLI (GPT-5-Codex) — 44.3%OpenHands (GPT-5) — 43.8%Terminus 2 (GPT-5-Codex) — 43.4%Terminus 2 (Claude Sonnet 4.5) — 42.8%The close clustering among top models indicates active competition across platforms, with no single agent solving more than half the tasks.Submission and UseTo test or submit an agent, users install Harbor and run the benchmark using simple CLI commands. Submissions to the leaderboard require five benchmark runs, and results can be emailed to the developers along with job directories for validation.harbor run -d terminal-bench@2.0 -m "" -a "" --n-attempts 5 --jobs-dir Terminal-Bench 2.0 is already being integrated into research workflows focused on agentic reasoning, code generation, and tool use. According to co-creator Mike Merrill, a postdoctoral researcher at Stanford, a detailed preprint is in progress covering the verification process and design methodology behind the benchmark.Aiming for StandardizationThe combined release of Terminal-Bench 2.0 and Harbor marks a step toward more consistent and scalable agent evaluation infrastructure. As LLM agents proliferate in developer and operational environments, the need for controlled, reproducible testing has grown.These tools offer a potential foundation for a unified evaluation stack — supporting model improvement, environment simulation, and benchmark standardization across the AI ecosystem.

#amazon bedrock #amazon machine learning #artificial intelligence #customer solutions #generative ai #ai/ml #aws customer

In this blog post, we explore how TR addressed key business use cases with Open Arena, a highly scalable and flexible no-code AI solution powered by Amazon Bedrock and other AWS services such as Amazon OpenSearch Service, Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, and AWS Lambda. We'll explain how TR used AWS services to build this solution, including how the architecture was designed, the use cases it solves, and the business profiles that use it.

#company announcements #google.org #ai #public policy #small business #grow with google #student programs

AI is creating new opportunities for Oklahomans to learn, grow and succeed. Google is committed to making sure the Sooner State is not just ready for this transformation…

#amazon bedrock #generative ai

Today, we are excited to announce the addition of structured output to Custom Model Import. Structured output constrains a model's generation process in real time so that every token it produces conforms to a schema you define. Rather than relying on prompt-engineering tricks or brittle post-processing scripts, you can now generate structured outputs directly at inference time.

#mit energy initiative #energy #energy efficiency #electric grid #artificial intelligence #emissions #technology and policy #cleaner industry #sustainability #climate change #research

MIT faculty and MITEI member company experts address power demand from data centers.

#beginner #career #linear regression #machine learning

When it comes to machine learning interviews, Linear Regression almost always shows up. It’s one of those algorithms that looks simple at first, and that’s exactly why interviewers love it. It’s like the “hello world” of ML: easy to understand on the surface, but full of details that reveal how well you actually know your […]
The post Crash Course to Crack Machine Learning Interview – Part 2: Linear Regression appeared first on Analytics Vidhya.

  When we ask ourselves the question, " what is inside machine learning systems? ", many of us picture frameworks and models that make predictions or perform tasks.

#data infrastructure #ai

Across industries, rising compute expenses are often cited as a barrier to AI adoption — but leading companies are finding that cost is no longer the real constraint.

The tougher challenges (and the ones top of mind for many tech leaders)? Latency, flexibility and capacity.

At Wonder, for instance, AI adds a mere few cents per order; the food delivery and takeout company is much more concerned with cloud capacity with skyrocketing demands. Recursion, for its part, has been focused on balancing small and larger-scale training and deployment via on-premises clusters and the cloud; this has afforded the biotech company flexibility for rapid experimentation.

The companies’ true in-the-wild experiences highlight a broader industry trend: For enterprises operating AI at scale, economics aren't the key decisive factor — the conversation has shifted from how to pay for AI to how fast it can be deployed and sustained.

AI leaders from the two companies recently sat down with Venturebeat’s CEO and editor-in-chief Matt Marshall as part of VB’s traveling AI Impact Series. Here’s what they shared. Wonder: Rethink what you assume about capacityWonder uses AI to power everything from recommendations to logistics — yet, as of now, reported CTO James Chen, AI adds just a few cents per order. Chen explained that the technology component of a meal order costs 14 cents, the AI adds 2 to 3 cents, although that’s “going up really rapidly” to 5 to 8 cents. Still, that seems almost immaterial compared to total operating costs.

Instead, the 100% cloud-native AI company’s main concern has been capacity with growing demand. Wonder was built with “the assumption” (which proved to be incorrect) that there would be “unlimited capacity” so they could move “super fast” and wouldn’t have to worry about managing infrastructure, Chen noted.

But the company has grown quite a bit over the last few years, he said; as a result, about six months ago, “we started getting little signals from the cloud providers, ‘Hey, you might need to consider going to region two,’” because they were running out of capacity for CPU or data storage at their facilities as demand grew.

It was “very shocking” that they had to move to plan B earlier than they anticipated. “Obviously it's good practice to be multi-region, but we were thinking maybe two more years down the road,” said Chen. What's not economically feasible (yet)Wonder built its own model to maximize its conversion rate, Chen noted; the goal is to surface new restaurants to relevant customers as much as possible. These are “isolated scenarios” where models are trained over time to be “very, very efficient and very fast.”

Currently, the best bet for Wonder’s use case is large models, Chen noted. But in the long term, they’d like to move to small models that are hyper-customized to individuals (via AI agents or concierges) based on their purchase history and even their clickstream. “Having these micro models is definitely the best, but right now the cost is very expensive,” Chen noted. “If you try to create one for each person, it's just not economically feasible.”Budgeting is an art, not a scienceWonder gives its devs and data scientists as much playroom as possible to experiment, and internal teams review the costs of use to make sure nobody turned on a model and “jacked up massive compute around a huge bill,” said Chen.

The company is trying different things to offload to AI and operate within margins. “But then it's very hard to budget because you have no idea,” he said. One of the challenging things is the pace of development; when a new model comes out, “we can’t just sit there, right? We have to use it.”

Budgeting for the unknown economics of a token-based system is “definitely art versus science.”

A critical component in the software development lifecycle is preserving context when using large native models, he explained. When you find something that works, you can add it to your company’s “corpus of context” that can be sent with every request. That’s big and it costs money each time.

“Over 50%, up to 80% of your costs is just resending the same information back into the same engine again on every request,” said Chen. In theory, the more they do should require less cost per unit. “I know when a transaction happens, I'll pay the X cent tax for each one, but I don't want to be limited to use the technology for all these other creative ideas."The 'vindication moment' for RecursionRecursion, for its part, has focused on meeting broad-ranging compute needs via a hybrid infrastructure of on-premise clusters and cloud inference.

When initially looking to build out its AI infrastructure, the company had to go with its own setup, as “the cloud providers didn't have very many good offerings,” explained CTO Ben Mabey. “The vindication moment was that we needed more compute and we looked to the cloud providers and they were like, ‘Maybe in a year or so.’”

The company’s first cluster in 2017 incorporated Nvidia gaming GPUs (1080s, launched in 2016); they have since added Nvidia H100s and A100s, and use a Kubernetes cluster that they run in the cloud or on-prem.

Addressing the longevity question, Mabey noted: “These gaming GPUs are actually still being used today, which is crazy, right? The myth that a GPU's life span is only three years, that's definitely not the case. A100s are still top of the list, they're the workhorse of the industry.” Best use cases on-prem vs cloud; cost differencesMore recently, Mabey’s team has been training a foundation model on Recursion’s image repository (which consists of petabytes of data and more than 200 pictures). This and other types of big training jobs have required a “massive cluster” and connected, multi-node setups.

“When we need that fully-connected network and access to a lot of our data in a high parallel file system, we go on-prem,” he explained. On the other hand, shorter workloads run in the cloud.

Recursion’s method is to “pre-empt” GPUs and Google tensor processing units (TPUs), which is the process of interrupting running GPU tasks to work on higher-priority ones. “Because we don't care about the speed in some of these inference workloads where we're uploading biological data, whether that's an image or sequencing data, DNA data,” Mabey explained. “We can say, ‘Give this to us in an hour,’ and we're fine if it kills the job.”

From a cost perspective, moving large workloads on-prem is “conservatively” 10 times cheaper, Mabey noted; for a five year TCO, it's half the cost. On the other hand, for smaller storage needs, the cloud can be “pretty competitive” cost-wise.

Ultimately, Mabey urged tech leaders to step back and determine whether they’re truly willing to commit to AI; cost-effective solutions typically require multi-year buy-ins.

“From a psychological perspective, I've seen peers of ours who will not invest in compute, and as a result they're always paying on demand," said Mabey. "Their teams use far less compute because they don't want to run up the cloud bill. Innovation really gets hampered by people not wanting to burn money.”

#ai

Researchers at New York University have developed a new architecture for diffusion models that improves the semantic representation of the images they generate. “Diffusion Transformer with Representation Autoencoders” (RAE) challenges some of the accepted norms of building diffusion models. The NYU researcher's model is more efficient and accurate than standard diffusion models, takes advantage of the latest research in representation learning and could pave the way for new applications that were previously too difficult or expensive.This breakthrough could unlock more reliable and powerful features for enterprise applications. "To edit images well, a model has to really understand what’s in them," paper co-author Saining Xie told VentureBeat. "RAE helps connect that understanding part with the generation part." He also pointed to future applications in "RAG-based generation, where you use RAE encoder features for search and then generate new images based on the search results," as well as in "video generation and action-conditioned world models."The state of generative modelingDiffusion models, the technology behind most of today’s powerful image generators, frame generation as a process of learning to compress and decompress images. A variational autoencoder (VAE) learns a compact representation of an image’s key features in a so-called “latent space.” The model is then trained to generate new images by reversing this process from random noise.While the diffusion part of these models has advanced, the autoencoder used in most of them has remained largely unchanged in recent years. According to the NYU researchers, this standard autoencoder (SD-VAE) is suitable for capturing low-level features and local appearance, but lacks the “global semantic structure crucial for generalization and generative performance.”At the same time, the field has seen impressive advances in image representation learning with models such as DINO, MAE and CLIP. These models learn semantically-structured visual features that generalize across tasks and can serve as a natural basis for visual understanding. However, a widely-held belief has kept devs from using these architectures in image generation: Models focused on semantics are not suitable for generating images because they don’t capture granular, pixel-level features. Practitioners also believe that diffusion models do not work well with the kind of high-dimensional representations that semantic models produce.Diffusion with representation encodersThe NYU researchers propose replacing the standard VAE with “representation autoencoders” (RAE). This new type of autoencoder pairs a pretrained representation encoder, like Meta’s DINO, with a trained vision transformer decoder. This approach simplifies the training process by using existing, powerful encoders that have already been trained on massive datasets.To make this work, the team developed a variant of the diffusion transformer (DiT), the backbone of most image generation models. This modified DiT can be trained efficiently in the high-dimensional space of RAEs without incurring huge compute costs. The researchers show that frozen representation encoders, even those optimized for semantics, can be adapted for image generation tasks. Their method yields reconstructions that are superior to the standard SD-VAE without adding architectural complexity. However, adopting this approach requires a shift in thinking. "RAE isn’t a simple plug-and-play autoencoder; the diffusion modeling part also needs to evolve," Xie explained. "One key point we want to highlight is that latent space modeling and generative modeling should be co-designed rather than treated separately."With the right architectural adjustments, the researchers found that higher-dimensional representations are an advantage, offering richer structure, faster convergence and better generation quality. In their paper, the researchers note that these "higher-dimensional latents introduce effectively no extra compute or memory costs." Furthermore, the standard SD-VAE is more computationally expensive, requiring about six times more compute for the encoder and three times more for the decoder, compared to RAE.Stronger performance and efficiencyThe new model architecture delivers significant gains in both training efficiency and generation quality. The team's improved diffusion recipe achieves strong results after only 80 training epochs. Compared to prior diffusion models trained on VAEs, the RAE-based model achieves a 47x training speedup. It also outperforms recent methods based on representation alignment with a 16x training speedup. This level of efficiency translates directly into lower training costs and faster model development cycles.For enterprise use, this translates into more reliable and consistent outputs. Xie noted that RAE-based models are less prone to semantic errors seen in classic diffusion, adding that RAE gives the model "a much smarter lens on the data." He observed that leading models like ChatGPT-4o and Google's Nano Banana are moving toward "subject-driven, highly consistent and knowledge-augmented generation," and that RAE's semantically rich foundation is key to achieving this reliability at scale and in open source models.The researchers demonstrated this performance on the ImageNet benchmark. Using the Fréchet Inception Distance (FID) metric, where a lower score indicates higher-quality images, the RAE-based model achieved a state-of-the-art score of 1.51 without guidance. With AutoGuidance, a technique that uses a smaller model to steer the generation process, the FID score dropped to an even more impressive 1.13 for both 256x256 and 512x512 images.By successfully integrating modern representation learning into the diffusion framework, this work opens a new path for building more capable and cost-effective generative models. This unification points toward a future of more integrated AI systems. "We believe that in the future, there will be a single, unified representation model that captures the rich, underlying structure of reality... capable of decoding into many different output modalities," Xie said. He added that RAE offers a unique path toward this goal: "The high-dimensional latent space should be learned separately to provide a strong prior that can then be decoded into various modalities — rather than relying on a brute-force approach of mixing all data and training with multiple objectives at once."

AI Tools of the Month, female AI robots, Siri + Gemini, AI finance tools, and more...

#ai

Even as concern and skepticism grows over U.S. AI startup OpenAI's buildout strategy and high spending commitments, Chinese open source AI providers are escalating their competition and one has even caught up to OpenAI's flagship, paid proprietary model GPT-5 in key third-party performance benchmarks with a new, free model. The Chinese AI startup Moonshot AI’s new Kimi K2 Thinking model, released today, has vaulted past both proprietary and open-weight competitors to claim the top position in reasoning, coding, and agentic-tool benchmarks. Despite being fully open-source, the model now outperforms OpenAI’s GPT-5, Anthropic’s Claude Sonnet 4.5 (Thinking mode), and xAI's Grok-4 on several standard evaluations — an inflection point for the competitiveness of open AI systems.Developers can access the model via platform.moonshot.ai and kimi.com; weights and code are hosted on Hugging Face. The open release includes APIs for chat, reasoning, and multi-tool workflows.Users can try out Kimi K2 Thinking directly through its own ChatGPT-like website competitor and on a Hugging Face space as well. Modified Standard Open Source LicenseMoonshot AI has formally released Kimi K2 Thinking under a Modified MIT License on Hugging Face.The license grants full commercial and derivative rights — meaning individual researchers and developers working on behalf of enterprise clients can access it freely and use it in commercial applications — but adds one restriction:"If the software or any derivative product serves over 100 million monthly active users or generates over $20 million USD per month in revenue, the deployer must prominently display 'Kimi K2' on the product’s user interface."For most research and enterprise applications, this clause functions as a light-touch attribution requirement while preserving the freedoms of standard MIT licensing. It makes K2 Thinking one of the most permissively licensed frontier-class models currently available.A New Benchmark LeaderKimi K2 Thinking is a Mixture-of-Experts (MoE) model built around one trillion parameters, of which 32 billion activate per inference. It combines long-horizon reasoning with structured tool use, executing up to 200–300 sequential tool calls without human intervention.According to Moonshot’s published test results, K2 Thinking achieved:44.9 % on Humanity’s Last Exam (HLE), a state-of-the-art score;60.2 % on BrowseComp, an agentic web-search and reasoning test;71.3 % on SWE-Bench Verified and 83.1 % on LiveCodeBench v6, key coding evaluations;56.3 % on Seal-0, a benchmark for real-world information retrieval.Across these tasks, K2 Thinking consistently outperforms GPT-5’s corresponding scores and surpasses the previous open-weight leader MiniMax-M2—released just weeks earlier by Chinese rival MiniMax AI.Open Model Outperforms Proprietary SystemsGPT-5 and Claude Sonnet 4.5 Thinking remain the leading proprietary “thinking” models. Yet in the same benchmark suite, K2 Thinking’s agentic reasoning scores exceed both: for instance, on BrowseComp the open model’s 60.2 % decisively leads GPT-5’s 54.9 % and Claude 4.5’s 24.1 %.K2 Thinking also edges GPT-5 in GPQA Diamond (85.7 % vs 84.5 %) and matches it on mathematical reasoning tasks such as AIME 2025 and HMMT 2025. Only in certain heavy-mode configurations—where GPT-5 aggregates multiple trajectories—does the proprietary model regain parity.That Moonshot’s fully open-weight release can meet or exceed GPT-5’s scores marks a turning point. The gap between closed frontier systems and publicly available models has effectively collapsed for high-end reasoning and coding.Surpassing MiniMax-M2: The Previous Open-Source BenchmarkWhen VentureBeat profiled MiniMax-M2 just a week and a half ago, it was hailed as the “new king of open-source LLMs,” achieving top scores among open-weight systems:τ²-Bench 77.2BrowseComp 44.0FinSearchComp-global 65.5SWE-Bench Verified 69.4Those results placed MiniMax-M2 near GPT-5-level capability in agentic tool use. Yet Kimi K2 Thinking now eclipses them by wide margins.Its BrowseComp result of 60.2 % exceeds M2’s 44.0 %, and its SWE-Bench Verified 71.3 % edges out M2’s 69.4 %. Even on financial-reasoning tasks such as FinSearchComp-T3 (47.4 %), K2 Thinking performs comparably while maintaining superior general-purpose reasoning.Technically, both models adopt sparse Mixture-of-Experts architectures for compute efficiency, but Moonshot’s network activates more experts and deploys advanced quantization-aware training (INT4 QAT). This design doubles inference speed relative to standard precision without degrading accuracy—critical for long “thinking-token” sessions reaching 256 k context windows.Agentic Reasoning and Tool UseK2 Thinking’s defining capability lies in its explicit reasoning trace. The model outputs an auxiliary field, reasoning_content, revealing intermediate logic before each final response. This transparency preserves coherence across long multi-turn tasks and multi-step tool calls.A reference implementation published by Moonshot demonstrates how the model autonomously conducts a “daily news report” workflow: invoking date and web-search tools, analyzing retrieved content, and composing structured output—all while maintaining internal reasoning state.This end-to-end autonomy enables the model to plan, search, execute, and synthesize evidence across hundreds of steps, mirroring the emerging class of “agentic AI” systems that operate with minimal supervision.Efficiency and AccessDespite its trillion-parameter scale, K2 Thinking’s runtime cost remains modest. Moonshot lists usage at:$0.15 / 1 M tokens (cache hit)$0.60 / 1 M tokens (cache miss)$2.50 / 1 M tokens outputThese rates are competitive even against MiniMax-M2’s $0.30 input / $1.20 output pricing—and an order of magnitude below GPT-5 ($1.25 input / $10 output).Comparative Context: Open-Weight AccelerationThe rapid succession of M2 and K2 Thinking illustrates how quickly open-source research is catching frontier systems. MiniMax-M2 demonstrated that open models could approach GPT-5-class agentic capability at a fraction of the compute cost. Moonshot has now advanced that frontier further, pushing open weights beyond parity into outright leadership.Both models rely on sparse activation for efficiency, but K2 Thinking’s higher activation count (32 B vs 10 B active parameters) yields stronger reasoning fidelity across domains. Its test-time scaling—expanding “thinking tokens” and tool-calling turns—provides measurable performance gains without retraining, a feature not yet observed in MiniMax-M2.Technical OutlookMoonshot reports that K2 Thinking supports native INT4 inference and 256 k-token contexts with minimal performance degradation. Its architecture integrates quantization, parallel trajectory aggregation (“heavy mode”), and Mixture-of-Experts routing tuned for reasoning tasks.In practice, these optimizations allow K2 Thinking to sustain complex planning loops—code compile–test–fix, search–analyze–summarize—over hundreds of tool calls. This capability underpins its superior results on BrowseComp and SWE-Bench, where reasoning continuity is decisive.Enormous Implications for the AI EcosystemThe convergence of open and closed models at the high end signals a structural shift in the AI landscape. Enterprises that once relied exclusively on proprietary APIs can now deploy open alternatives matching GPT-5-level reasoning while retaining full control of weights, data, and compliance.Moonshot’s open publication strategy follows the precedent set by DeepSeek R1, Qwen3, GLM-4.6 and MiniMax-M2 but extends it to full agentic reasoning. For academic and enterprise developers, K2 Thinking provides both transparency and interoperability—the ability to inspect reasoning traces and fine-tune performance for domain-specific agents.The arrival of K2 Thinking signals that Moonshot — a young startup founded in 2023 with investment from some of China's biggest apps and tech companies — is here to play in an intensifying competition, and comes amid growing scrutiny of the financial sustainability of AI’s largest players. Just a day ago, OpenAI CFO Sarah Friar sparked controversy after suggesting at WSJ Tech Live event that the U.S. government might eventually need to provide a “backstop” for the company’s more than $1.4 trillion in compute and data-center commitments — a comment widely interpreted as a call for taxpayer-backed loan guarantees.Although Friar later clarified that OpenAI was not seeking direct federal support, the episode reignited debate about the scale and concentration of AI capital spending. With OpenAI, Microsoft, Meta, and Google all racing to secure long-term chip supply, critics warn of an unsustainable investment bubble and “AI arms race” driven more by strategic fear than commercial returns — one that could "blow up" and take down the entire global economy with it if there is hesitation or market uncertainty, as so many trades and valuations have now been made in anticipation of continued hefty AI investment and massive returns. Against that backdrop, Moonshot AI’s and MiniMax’s open-weight releases put more pressure on U.S. proprietary AI firms and their backers to justify the size of the investments and paths to profitability. If an enterprise customer can just as easily get comparable or better performance from a free, open source Chinese AI model than they do with paid, proprietary AI solutions like OpenAI's GPT-5, Anthropic's Claude Sonnet 4.5, or Google's Gemini 2.5 Pro — why would they continue paying to access the proprietary models? Already, Silicon Valley stalwarts like Airbnb have raised eyebrows for admitting to heavily using Chinese open source alternatives like Alibaba's Qwen over OpenAI's proprietary offerings. For investors and enterprises, these developments suggest that high-end AI capability is no longer synonymous with high-end capital expenditure. The most advanced reasoning systems may now come not from companies building gigascale data centers, but from research groups optimizing architectures and quantization for efficiency.In that sense, K2 Thinking’s benchmark dominance is not just a technical milestone—it’s a strategic one, arriving at a moment when the AI market’s biggest question has shifted from how powerful models can become to who can afford to sustain them.What It Means for Enterprises Going ForwardWithin weeks of MiniMax-M2’s ascent, Kimi K2 Thinking has overtaken it—along with GPT-5 and Claude 4.5—across nearly every reasoning and agentic benchmark. The model demonstrates that open-weight systems can now meet or surpass proprietary frontier models in both capability and efficiency.For the AI research community, K2 Thinking represents more than another open model: it is evidence that the frontier has become collaborative. The best-performing reasoning model available today is not a closed commercial product but an open-source system accessible to anyone.

#developers #ai

File Search is a fully managed Retrieval Augmented Generation (RAG) system built directly into the Gemini API.

#amazon bedrock #amazon bedrock agentcore #amazon machine learning #artificial intelligence #intermediate (200)

Earlier this year, we introduced Amazon Bedrock AgentCore Gateway, a fully managed service that serves as a centralized MCP tool server, providing a unified interface where agents can discover, access, and invoke tools. Today, we're extending support for existing MCP servers as a new target type in AgentCore Gateway. With this capability, you can group multiple task-specific MCP servers aligned to agent goals behind a single, manageable MCP gateway interface. This reduces the operational complexity of maintaining separate gateways, while providing the same centralized tool and authentication management that existed for REST APIs and AWS Lambda functions.

#learning & education #ai

How Google approaches AI and education, from our tools to our commitment to responsibility.

#search #ai

Learn more about the new Google Finance, including new features like Deep Search and prediction markets data.

#classes and programs #computer science and technology #artificial intelligence #algorithms #machine learning #data #students #graduate, postdoctoral #research #electrical engineering and computer science (eecs) #laboratory for information and decision systems (lids) #mit-ibm watson ai lab #school of engineering #institute for medical engineering and science (imes) #mit schwarzman college of computing #computer science and artificial intelligence laboratory (csail)

MIT PhD students who interned with the MIT-IBM Watson AI Lab Summer Program are pushing AI tools to be more flexible, efficient, and grounded in truth.

From November 6 to November 21, 2025 (starting at 8:00 a.

#ai #big data #data infrastructure

Google Cloud is introducing what it calls its most powerful artificial intelligence infrastructure to date, unveiling a seventh-generation Tensor Processing Unit and expanded Arm-based computing options designed to meet surging demand for AI model deployment — what the company characterizes as a fundamental industry shift from training models to serving them to billions of users.The announcement, made Thursday, centers on Ironwood, Google's latest custom AI accelerator chip, which will become generally available in the coming weeks. In a striking validation of the technology, Anthropic, the AI safety company behind the Claude family of models, disclosed plans to access up to one million of these TPU chips — a commitment worth tens of billions of dollars and among the largest known AI infrastructure deals to date.The move underscores an intensifying competition among cloud providers to control the infrastructure layer powering artificial intelligence, even as questions mount about whether the industry can sustain its current pace of capital expenditure. Google's approach — building custom silicon rather than relying solely on Nvidia's dominant GPU chips — amounts to a long-term bet that vertical integration from chip design through software will deliver superior economics and performance.Why companies are racing to serve AI models, not just train themGoogle executives framed the announcements around what they call "the age of inference" — a transition point where companies shift resources from training frontier AI models to deploying them in production applications serving millions or billions of requests daily."Today's frontier models, including Google's Gemini, Veo, and Imagen and Anthropic's Claude train and serve on Tensor Processing Units," said Amin Vahdat, vice president and general manager of AI and Infrastructure at Google Cloud. "For many organizations, the focus is shifting from training these models to powering useful, responsive interactions with them."This transition has profound implications for infrastructure requirements. Where training workloads can often tolerate batch processing and longer completion times, inference — the process of actually running a trained model to generate responses — demands consistently low latency, high throughput, and unwavering reliability. A chatbot that takes 30 seconds to respond, or a coding assistant that frequently times out, becomes unusable regardless of the underlying model's capabilities.Agentic workflows — where AI systems take autonomous actions rather than simply responding to prompts — create particularly complex infrastructure challenges, requiring tight coordination between specialized AI accelerators and general-purpose computing.Inside Ironwood's architecture: 9,216 chips working as one supercomputerIronwood is more than incremental improvement over Google's sixth-generation TPUs. According to technical specifications shared by the company, it delivers more than four times better performance for both training and inference workloads compared to its predecessor — gains that Google attributes to a system-level co-design approach rather than simply increasing transistor counts.The architecture's most striking feature is its scale. A single Ironwood "pod" — a tightly integrated unit of TPU chips functioning as one supercomputer — can connect up to 9,216 individual chips through Google's proprietary Inter-Chip Interconnect network operating at 9.6 terabits per second. To put that bandwidth in perspective, it's roughly equivalent to downloading the entire Library of Congress in under two seconds.This massive interconnect fabric allows the 9,216 chips to share access to 1.77 petabytes of High Bandwidth Memory — memory fast enough to keep pace with the chips' processing speeds. That's approximately 40,000 high-definition Blu-ray movies' worth of working memory, instantly accessible by thousands of processors simultaneously. "For context, that means Ironwood Pods can deliver 118x more FP8 ExaFLOPS versus the next closest competitor," Google stated in technical documentation.The system employs Optical Circuit Switching technology that acts as a "dynamic, reconfigurable fabric." When individual components fail or require maintenance — inevitable at this scale — the OCS technology automatically reroutes data traffic around the interruption within milliseconds, allowing workloads to continue running without user-visible disruption.This reliability focus reflects lessons learned from deploying five previous TPU generations. Google reported that its fleet-wide uptime for liquid-cooled systems has maintained approximately 99.999% availability since 2020 — equivalent to less than six minutes of downtime per year.Anthropic's billion-dollar bet validates Google's custom silicon strategyPerhaps the most significant external validation of Ironwood's capabilities comes from Anthropic's commitment to access up to one million TPU chips — a staggering figure in an industry where even clusters of 10,000 to 50,000 accelerators are considered massive."Anthropic and Google have a longstanding partnership and this latest expansion will help us continue to grow the compute we need to define the frontier of AI," said Krishna Rao, Anthropic's chief financial officer, in the official partnership agreement. "Our customers — from Fortune 500 companies to AI-native startups — depend on Claude for their most important work, and this expanded capacity ensures we can meet our exponentially growing demand."According to a separate statement, Anthropic will have access to "well over a gigawatt of capacity coming online in 2026" — enough electricity to power a small city. The company specifically cited TPUs' "price-performance and efficiency" as key factors in the decision, along with "existing experience in training and serving its models with TPUs."Industry analysts estimate that a commitment to access one million TPU chips, with associated infrastructure, networking, power, and cooling, likely represents a multi-year contract worth tens of billions of dollars — among the largest known cloud infrastructure commitments in history.James Bradbury, Anthropic's head of compute, elaborated on the inference focus: "Ironwood's improvements in both inference performance and training scalability will help us scale efficiently while maintaining the speed and reliability our customers expect."Google's Axion processors target the computing workloads that make AI possibleAlongside Ironwood, Google introduced expanded options for its Axion processor family — custom Arm-based CPUs designed for general-purpose workloads that support AI applications but don't require specialized accelerators.The N4A instance type, now entering preview, targets what Google describes as "microservices, containerized applications, open-source databases, batch, data analytics, development environments, experimentation, data preparation and web serving jobs that make AI applications possible." The company claims N4A delivers up to 2X better price-performance than comparable current-generation x86-based virtual machines.Google is also previewing C4A metal, its first bare-metal Arm instance, which provides dedicated physical servers for specialized workloads such as Android development, automotive systems, and software with strict licensing requirements.The Axion strategy reflects a growing conviction that the future of computing infrastructure requires both specialized AI accelerators and highly efficient general-purpose processors. While a TPU handles the computationally intensive task of running an AI model, Axion-class processors manage data ingestion, preprocessing, application logic, API serving, and countless other tasks in a modern AI application stack.Early customer results suggest the approach delivers measurable economic benefits. Vimeo reported observing "a 30% improvement in performance for our core transcoding workload compared to comparable x86 VMs" in initial N4A tests. ZoomInfo measured "a 60% improvement in price-performance" for data processing pipelines running on Java services, according to Sergei Koren, the company's chief infrastructure architect.Software tools turn raw silicon performance into developer productivityHardware performance means little if developers cannot easily harness it. Google emphasized that Ironwood and Axion are integrated into what it calls AI Hypercomputer — "an integrated supercomputing system that brings together compute, networking, storage, and software to improve system-level performance and efficiency."According to an October 2025 IDC Business Value Snapshot study, AI Hypercomputer customers achieved on average 353% three-year return on investment, 28% lower IT costs, and 55% more efficient IT teams.Google disclosed several software enhancements designed to maximize Ironwood utilization. Google Kubernetes Engine now offers advanced maintenance and topology awareness for TPU clusters, enabling intelligent scheduling and highly resilient deployments. The company's open-source MaxText framework now supports advanced training techniques including Supervised Fine-Tuning and Generative Reinforcement Policy Optimization.Perhaps most significant for production deployments, Google's Inference Gateway intelligently load-balances requests across model servers to optimize critical metrics. According to Google, it can reduce time-to-first-token latency by 96% and serving costs by up to 30% through techniques like prefix-cache-aware routing.The Inference Gateway monitors key metrics including KV cache hits, GPU or TPU utilization, and request queue length, then routes incoming requests to the optimal replica. For conversational AI applications where multiple requests might share context, routing requests with shared prefixes to the same server instance can dramatically reduce redundant computation.The hidden challenge: powering and cooling one-megawatt server racksBehind these announcements lies a massive physical infrastructure challenge that Google addressed at the recent Open Compute Project EMEA Summit. The company disclosed that it's implementing +/-400 volt direct current power delivery capable of supporting up to one megawatt per rack — a tenfold increase from typical deployments."The AI era requires even greater power delivery capabilities," explained Madhusudan Iyengar and Amber Huffman, Google principal engineers, in an April 2025 blog post. "ML will require more than 500 kW per IT rack before 2030."Google is collaborating with Meta and Microsoft to standardize electrical and mechanical interfaces for high-voltage DC distribution. The company selected 400 VDC specifically to leverage the supply chain established by electric vehicles, "for greater economies of scale, more efficient manufacturing, and improved quality and scale."On cooling, Google revealed it will contribute its fifth-generation cooling distribution unit design to the Open Compute Project. The company has deployed liquid cooling "at GigaWatt scale across more than 2,000 TPU Pods in the past seven years" with fleet-wide availability of approximately 99.999%.Water can transport approximately 4,000 times more heat per unit volume than air for a given temperature change — critical as individual AI accelerator chips increasingly dissipate 1,000 watts or more.Custom silicon gambit challenges Nvidia's AI accelerator dominanceGoogle's announcements come as the AI infrastructure market reaches an inflection point. While Nvidia maintains overwhelming dominance in AI accelerators — holding an estimated 80-95% market share — cloud providers are increasingly investing in custom silicon to differentiate their offerings and improve unit economics.Amazon Web Services pioneered this approach with Graviton Arm-based CPUs and Inferentia / Trainium AI chips. Microsoft has developed Cobalt processors and is reportedly working on AI accelerators. Google now offers the most comprehensive custom silicon portfolio among major cloud providers.The strategy faces inherent challenges. Custom chip development requires enormous upfront investment — often billions of dollars. The software ecosystem for specialized accelerators lags behind Nvidia's CUDA platform, which benefits from 15+ years of developer tools. And rapid AI model architecture evolution creates risk that custom silicon optimized for today's models becomes less relevant as new techniques emerge.Yet Google argues its approach delivers unique advantages. "This is how we built the first TPU ten years ago, which in turn unlocked the invention of the Transformer eight years ago — the very architecture that powers most of modern AI," the company noted, referring to the seminal "Attention Is All You Need" paper from Google researchers in 2017.The argument is that tight integration — "model research, software, and hardware development under one roof" — enables optimizations impossible with off-the-shelf components.Beyond Anthropic, several other customers provided early feedback. Lightricks, which develops creative AI tools, reported that early Ironwood testing "makes us highly enthusiastic" about creating "more nuanced, precise, and higher-fidelity image and video generation for our millions of global customers," said Yoav HaCohen, the company's research director.Google's announcements raise questions that will play out over coming quarters. Can the industry sustain current infrastructure spending, with major AI companies collectively committing hundreds of billions of dollars? Will custom silicon prove economically superior to Nvidia GPUs? How will model architectures evolve?For now, Google appears committed to a strategy that has defined the company for decades: building custom infrastructure to enable applications impossible on commodity hardware, then making that infrastructure available to customers who want similar capabilities without the capital investment.As the AI industry transitions from research labs to production deployments serving billions of users, that infrastructure layer — the silicon, software, networking, power, and cooling that make it all run — may prove as important as the models themselves.And if Anthropic's willingness to commit to accessing up to one million chips is any indication, Google's bet on custom silicon designed specifically for the age of inference may be paying off just as demand reaches its inflection point.

#ai & ml #commentary

Much like the introduction of the personal computer, the internet, and the iPhone into the public sphere, recent developments in the AI space, from generative AI to agentic AI, have fundamentally changed the way people live and work. Since ChatGPT’s release in late 2022, it’s reached a threshold of 700 million users per week, approximately […]

  Every large language model (LLM) application that retrieves information faces a simple problem: how do you break down a 50-page document into pieces that a model can actually use? So when you’re building a retrieval-augmented generation (RAG) app, before your vector database retrieves anything and your LLM generates responses, your documents need to be split into chunks.

#ai & ml #commentary

In a recent newsletter, Ben Thompson suggested paying attention to a portion of Jensen Huang’s keynote at NVIDIA’s GPU Technology Conference (GTC) in DC, calling it “an excellent articulation of the thesis that the AI market is orders of magnitude bigger than the software market.” While I’m reluctant to contradict as astute an observer as […]

« 1...2021222324...191»
×