AI Tools of the Month, AI gets rich and famous, a Nano Banana camera, and more...
Receiving the Robert A. Muh award, the technologist and author heralded a bright future for AI, breakthroughs in longevity, and more.
In this post, we demonstrate how to integrate Amazon SageMaker HyperPod with Anyscale platform to address critical infrastructure challenges in building and deploying large-scale AI models. The combined solution provides robust infrastructure for distributed AI workloads with high-performance hardware, continuous monitoring, and seamless integration with Ray, the leading AI compute engine, enabling organizations to reduce time-to-market and lower total cost of ownership.
In this post, we introduce Amazon Nova customization for text content moderation through Amazon SageMaker AI, enabling organizations to fine-tune models for their specific moderation needs. The evaluation across three benchmarks shows that customized Nova models achieve an average improvement of 7.3% in F1 scores compared to the baseline Nova Lite, with individual improvements ranging from 4.2% to 9.2% across different content moderation tasks.
A snapshot of how top companies, governments, researchers and startups are enhancing their work with Google's AI solutions.
Echelon, an artificial intelligence startup that automates enterprise software implementations, emerged from stealth mode today with $4.75 million in seed funding led by Bain Capital Ventures, targeting a fundamental shift in how companies deploy and maintain critical business systems.The San Francisco-based company has developed AI agents specifically trained to handle end-to-end ServiceNow implementations — complex enterprise software deployments that traditionally require months of work by offshore consulting teams and cost companies millions of dollars annually."The biggest barrier to digital transformation isn't technology — it's the time it takes to implement it," said Rahul Kayala, Echelon's founder and CEO, who previously worked at AI-powered IT company Moveworks. "AI agents are eliminating that constraint entirely, allowing enterprises to experiment, iterate, and deploy platform changes with unprecedented speed."The announcement signals a potential disruption to the $1.5 trillion global IT services market, where companies like Accenture, Deloitte, and Capgemini have long dominated through labor-intensive consulting models that Echelon argues are becoming obsolete in the age of artificial intelligence.Why ServiceNow deployments take months and cost millionsServiceNow, a cloud-based platform used by enterprises to manage IT services, human resources, and business workflows, has become critical infrastructure for large organizations. However, implementing and customizing the platform typically requires specialized expertise that most companies lack internally.The complexity stems from ServiceNow's vast customization capabilities. Organizations often need hundreds of "catalog items" — digital forms and workflows for employee requests — each requiring specific configurations, approval processes, and integrations with existing systems. According to Echelon's research, these implementations frequently stretch far beyond planned timelines due to technical complexity and communication bottlenecks between business stakeholders and development teams."What starts out simple often turns into weeks of effort once the actual work begins," the company noted in its analysis of common implementation challenges. "A basic request form turns out to be five requests stuffed into one. We had catalog items with 50+ variables, 10 or more UI policies, all connected. Update one field, and something else would break."The traditional solution involves hiring offshore development teams or expensive consultants, creating what Echelon describes as a problematic cycle: "One question here, one delay there, and suddenly you're weeks behind."How AI agents replace expensive offshore consulting teamsEchelon's approach replaces human consultants with AI agents trained by elite ServiceNow experts from top consulting firms. These agents can analyze business requirements, ask clarifying questions in real-time, and automatically generate complete ServiceNow configurations including forms, workflows, testing scenarios, and documentation.The technology delivers a significant advancement from general-purpose AI tools. Rather than providing generic code suggestions, Echelon's agents understand ServiceNow's specific architecture, best practices, and common integration patterns. They can identify gaps in requirements and propose solutions that align with enterprise governance standards."Instead of routing every piece of input through five people, the business process owner directly uploaded their requirements," Kayala explained, describing a recent customer implementation. "The AI developer analyzes it and asks follow-up questions like: 'I see a process flow with 3 branches, but only 2 triggers. Should there be a 3rd?' The kinds of things a seasoned developer would ask. With AI, these questions came instantly."Early customers report dramatic time savings. One financial services company saw a service catalog migration project that was projected to take six months completed in six weeks using Echelon's AI agents.What makes Echelon's AI different from coding assistantsEchelon's technology addresses several technical challenges that have prevented broader AI adoption in enterprise software implementation. The agents are trained not just on ServiceNow's technical capabilities but on the accumulated expertise of senior consultants who understand complex enterprise requirements, governance frameworks, and integration patterns.This approach differs from general-purpose AI coding assistants like GitHub Copilot, which provide syntax suggestions but lack domain-specific expertise. Echelon's agents understand ServiceNow's data models, security frameworks, and upgrade considerations—knowledge typically acquired through years of consulting experience.The company's training methodology involves elite ServiceNow experts from consulting firms like Accenture and specialized ServiceNow partner Thirdera. This embedded expertise enables the AI to handle complex requirements and edge cases that typically require senior consultant intervention.The real challenge isn't teaching AI to write code — it's capturing the intuitive expertise that separates junior developers from seasoned architects. Senior ServiceNow consultants instinctively know which customizations will break during upgrades and how simple requests spiral into complex integration problems. This institutional knowledge creates a far more defensible moat than general-purpose coding assistants can offer.The $1.5 trillion consulting market faces disruptionEchelon's emergence reflects broader trends reshaping the enterprise software market. As companies accelerate digital transformation initiatives, the traditional consulting model increasingly appears inadequate for the speed and scale required.ServiceNow itself has grown rapidly, reporting over $10.98 billion in annual revenue in 2024, and $12.06 billion for the trailing twelve months ending June 30, 2025, as organizations continue to digitize more business processes. However, this growth has created a persistent talent shortage, with demand for skilled ServiceNow professionals — particularly those with AI expertise — significantly outpacing supply.The startup's approach could fundamentally alter the economics of enterprise software implementation. Traditional consulting engagements often involve large teams working for months, with costs scaling linearly with project complexity. AI agents, by contrast, can handle multiple projects simultaneously and apply learned knowledge across customers.Rak Garg, the Bain Capital Ventures partner who led Echelon's funding round, sees this as part of a larger shift toward AI-powered professional services. "We see the same trend with other BCV companies like Prophet Security, which automates security operations, and Crosby, which automates legal services for startups. AI is quickly becoming the delivery layer across multiple functions."Scaling beyond ServiceNow while maintaining enterprise reliabilityDespite early success, Echelon faces significant challenges in scaling its approach. Enterprise customers prioritize reliability above speed, and any AI-generated configurations must meet strict security and compliance requirements."Inertia is the biggest risk," Garg acknowledged. "IT systems shouldn't ever go down, and companies lose thousands of man-hours of productivity with every outage. Proving reliability at scale, and building on repeatable results will be critical for Echelon."The company plans to expand beyond ServiceNow to other enterprise platforms including SAP, Salesforce, and Workday — each creating substantial additional market opportunities. However, each platform requires developing new domain expertise and training models on platform-specific best practices.Echelon also faces potential competition from established consulting firms that are developing their own AI capabilities. However, Garg views these firms as potential partners rather than competitors, noting that many have already approached Echelon about collaboration opportunities."They know that AI is shifting their business model in real-time," he said. "Customers are placing immense pricing pressure on larger firms and asking hard questions, and these firms can use Echelon agents to accelerate their projects."How AI agents could reshape all professional servicesEchelon's funding and emergence from stealth marks a significant milestone in the application of AI to professional services. Unlike consumer AI applications that primarily enhance individual productivity, enterprise AI agents like Echelon's directly replace skilled labor at scale.The company's approach — training AI systems on expert knowledge rather than just technical documentation — could serve as a model for automating other complex professional services. Legal research, financial analysis, and technical consulting all involve similar patterns of applying specialized expertise to unique customer requirements.For enterprise customers, the promise extends beyond cost savings to strategic agility. Organizations that can rapidly implement and modify business processes gain competitive advantages in markets where customer expectations and regulatory requirements change frequently.As Kayala noted, "This unlocks a completely different approach to business agility and competitive advantage."The implications extend far beyond ServiceNow implementations. If AI agents can master the intricacies of enterprise software deployment—one of the most complex and relationship-dependent areas of professional services — few knowledge work domains may remain immune to automation.The question isn't whether AI will transform professional services, but how quickly human expertise can be converted into autonomous digital workers that never sleep, never leave for competitors, and get smarter with every project they complete.
The future of reporting will be about encoding the value proposition of a product into prompt design.
The post Past is Prologue: How Conversational Analytics Is Changing Data Work appeared first on Towards Data Science.
A turning point for data analysis?
The post How the Rise of Tabular Foundation Models Is Reshaping Data Science appeared first on Towards Data Science.
These 7 prompt templates will make LLMs your most useful assistant.
True business transformation in the era of AI must go beyond simple chatbots. That’s what Gemini Enterprise does.
Here are four ways Gemini Enterprise can help you and your team get time back in your day.
This article originally appeared on Medium. Tim O’Brien has given us permission to repost here on Radar. When you’re working with AI tools like Cursor or GitHub Copilot, the real power isn’t just having access to different models—it’s knowing when to use them. Some jobs are OK with Auto. Others need a stronger model. And […]
The online trend takes a comedic approach to spreading anti-AI messaging, but some creators are using racist references to make their point.
OpenAI’s annual developer conference on Monday was a spectacle of ambitious AI product launches, from an app store for ChatGPT to a stunning video-generation API that brought creative concepts to life. But for the enterprises and technical leaders watching closely, the most consequential announcement was the quiet general availability of Codex, the company's AI software engineer. This release signals a profound shift in how software—and by extension, modern business—is built.While other announcements captured the public’s imagination, the production-ready release of Codex, supercharged by a new specialized model and a suite of enterprise-grade tools, is the engine behind OpenAI’s entire vision. It is the tool that builds the tools, the proven agent in a world buzzing with agentic potential, and the clearest articulation of the company's strategy to win the enterprise.The general availability of Codex moves it from a "research preview" to a fully supported product, complete with a new software development kit (SDK), a Slack integration, and administrative controls for security and monitoring.This transition declares that Codex is ready for mission-critical work inside the world’s largest companies."We think this is the best time in history to be a builder; it has never been faster to go from idea to product," said OpenAI CEO Sam Altman during the opening keynote presentation. "Software used to take months or years to build. You saw that it can take minutes now to build with AI." That acceleration is not theoretical. It's a reality born from OpenAI’s own internal use — a massive "dogfooding" effort that serves as the ultimate case study for enterprise customers.Inside GPT-5-Codex: The AI model that codes autonomously for hours and drives 70% productivity gainsAt the heart of the Codex upgrade is GPT-5-Codex, a version of OpenAI's latest flagship model that has been "purposely trained for Codex and agentic coding." The new model is designed to function as an autonomous teammate, moving far beyond simple code autocompletion."I personally like to think about it as a little bit like a human teammate," explained Tibo Sottiaux, an OpenAI engineer, during a technical session on Codex. "You can pair a program with it on your computer, you can delegate to it, or as you'll see, you can give it a job without explicit prompting."This new model enables "adaptive thinking," allowing it to dynamically adjust the time and computational effort spent on a task based on its complexity.For simple requests, it's fast and efficient, but for complex refactoring projects, it can work for hours.One engineer during the technical session noted, "I've seen the GPT-5-Codex model work for over seven hours productively... on a marathon session." This capability to handle long-running, complex tasks is a significant leap beyond the simple, single-shot interactions that define most AI coding assistants.The results inside OpenAI have been dramatic. The company reported that 92% of its technical staff now uses Codex daily, and those engineers complete 70% more pull requests (a measure of code contribution) each week. Usage has surged tenfold since August. "When we as a team see the stats, it feels great," Sottiaux shared. "But even better is being at lunch with someone who then goes 'Hey I use Codex all the time. Here's a cool thing that I do with it. Do you want to hear about it?'" How OpenAI uses Codex to build its own AI products and catch hundreds of bugs dailyPerhaps the most compelling argument for Codex’s importance is that it is the foundational layer upon which OpenAI’s other flashy announcements were built. During the DevDay event, the company showcased custom-built arcade games and a dynamic, AI-powered website for the conference itself, all developed using Codex.In one session, engineers demonstrated how they built "Storyboard," a custom creative tool for the film industry, in just 48 hours during an internal hackathon. "We decided to test Codex, our coding agent... we would send tasks to Codex in between meetings. We really easily reviewed and merged PRs into production, which Codex even allowed us to do from our phones," said Allison August, a solutions engineering leader at OpenAI. This reveals a critical insight: the rapid innovation showcased at DevDay is a direct result of the productivity flywheel created by Codex. The AI is a core part of the manufacturing process for all other AI products.A key enterprise-focused feature is the new, more robust code review capability. OpenAI said it "purposely trained GPT-5-Codex to be great at ultra thorough code review," enabling it to explore dependencies and validate a programmer's intent against the actual implementation to find high-quality bugs.Internally, nearly every pull request at OpenAI is now reviewed by Codex, catching hundreds of issues daily before they reach a human reviewer."It saves you time, you ship with more confidence," Sottiaux said. "There's nothing worse than finding a bug after we actually ship the feature." Why enterprise software teams are choosing Codex over GitHub Copilot for mission-critical developmentThe maturation of Codex is central to OpenAI’s broader strategy to conquer the enterprise market, a move essential to justifying its massive valuation and unprecedented compute expenditures. During a press conference, CEO Sam Altman confirmed the strategic shift."The models are there now, and you should expect a huge focus from us on really winning enterprises with amazing products, starting here," Altman said during a private press conference. OpenAI President and Co-founder Greg Brockman immediately added, "And you can see it already with Codex, which I think has been just an incredible success and has really grown super fast." For technical decision-makers, the message is clear. While consumer-facing agents that book dinner reservations are still finding their footing, Codex is a proven enterprise agent delivering substantial ROI today. Companies like Cisco have already rolled out Codex to their engineering organizations, cutting code review times by 50% and reducing project timelines from weeks to days.With the new Codex SDK, companies can now embed this agentic power directly into their own custom workflows, such as automating fixes in a CI/CD pipeline or even creating self-evolving applications. During a live demo, an engineer showcased a mobile app that updated its own user interface in real-time based on a natural language prompt, all powered by the embedded Codex SDK. While the launch of an app ecosystem in ChatGPT and the breathtaking visuals of the Sora 2 API rightfully generated headlines, the general availability of Codex marks a more fundamental and immediate transformation. It is the quiet but powerful engine driving the next era of software development, turning the abstract promise of AI-driven productivity into a tangible, deployable reality for businesses today.
Presented by ZendeskZendesk powers nearly 5 billion resolutions every year for over 100,000 customers around the world, with about 20,000 of its customers (and growing) using its AI services. Zendesk is poised to generate about $200 million in AI-related revenue this year, double than some of its largest competitors, while investing $400 million dollars in R&D. Much of that research is focused on upgrading the Zendesk Resolution Platform, a complete AI-first solution for customer service, employee service, and contact center teams, announced at Relate this past March.During AI Summit, Chief Executive Officer Tom Eggemeier, along with members of the Zendesk team, took to the stage to announce several major advancements, including voice AI agents, video calling, and screen sharing for Zendesk Contact Center, and improved IT asset management, as well as the introduction of next-generation analytics, in the wake of its acquisition of HyperArc."We have built the only platform that is purpose-built for service and purpose-built for AI," Eggemeier said. "That focus is why we lead in AI for all types of service. And it is why we can deliver what no one else can for every service need you have in your organization."New capabilities across use cases and companiesAt its core, the Resolution Platform powers autonomous AI agents that solve complex issues in real time, leveraging leading LLMs like GPT-5, developed in collaboration with OpenAI, and supporting Model Context Protocol (MCP) to instantly access data, which streamlines workflows and improves autonomous problem-solving. "Since our launch in March, we’ve been building fast, focused on making AI agents smarter, more flexible, and ready for even more channels," said Shashi Upadhyay, president of product, engineering, and AI at Zendesk. "And now, these AI agents are getting even better. They work across messaging, email, and now voice. They are getting smarter; able to handle multiple intents in a single message, detecting, remembering, and resolving many issues at once."The only platform with native built-in QA, resolutions are automatically scored down to the conversation level, so teams can track resolution quality at scale. For startups, these insights are critical. They not only show what worked, but what needs fixing before it costs them time, reputation, or growth, and importantly, fit within a startup budget. That’s because Zendesk is the only company that charges only for successful resolutions, which are verified through the industry’s longest validation window, with two layers of quality checks.Making the product CX admin a hero Zendesk demonstrated the platform’s new features by highlighting a hypothetical wearable device company’s product launch. Service leaders at every stop along the product launch journey — from design to manufacturing — manage emerging issues with the support of the upgraded Resolution Platform.For a global manufacturer that builds complex, state-of-the-art wearable tech, the pressure starts the moment a new product hits the market, tickets start pouring in, and a red-flagged backlog piles up. "It is not a product issue, it is a resolution bottleneck," Upadhyay said. But, he added, "What once took days can now be resolved instantly." The new Zendesk Admin Copilot is designed specifically to assist human agents, helping them spot what is not working, what to do next, and carry out changes quickly. It flags operational issues, like missing intent tags, broken internal processes, or routing conflicts that delay resolution. Copilot explains what is happening in plain language, recommends specific fixes, and with the admin’s approval, can make the changes itself. It's grounded in live Zendesk data, like tickets, triggers, and knowledge, so every recommendation is specific, current, and based on how the service operation actually runs. Once the admin identifies the issue and implements a fix, the next step is ensuring everyone has access to the right knowledge to support it. For many organizations, that information lives outside of Zendesk. The newly launched Knowledge Connectors allows admins to pull in relevant content, like configuration guides or policy details, without needing to migrate anything so both human and AI agents have access to real-time instructions tied to the exact product version. The admin also creates a smarter feedback loop with the new Action Builder, which automatically tags, summarizes, and sends notifications to the product team through Microsoft Teams. And finally, Zendesk HyperArc will bring customers insights that combine AI and human analysis in a clear, narrative-driven view of what is happening and why, instead of siloed dashboards or static reports."With these innovations in place, change at the manufacturing plant cascades quickly, tickets are routed cleanly, support agents know what to say, engineering sees real signals instead of scattered anecdotes, and customers who just want the product to work get fast, reliable resolutions," Upadhyay said. "The CX Admin becomes the quiet hero of the manufacturer’s story."Solutions for the retail CX leaderAs a CX or contact center leader for a retail company, when a must-have wearable drops, how do you deliver service for your new hit product that feels personal and consistent when your team is stretched across multiple countries, channels, and customer expectations at once? "Intelligent automation doesn’t just streamline operations — it enhances the customer experience across borders and channels," said Lisa Kant, senior vice president of marketing at Zendesk. Zendesk’s Voice AI Agents are fully autonomous AI agents designed to understand natural speech, take action, and resolve issues without needing to escalate. They can verify identity, track orders, update deliveries, and answer setup questions in multiple languages, while keeping the brand experience consistent. Meanwhile, Video Calling lets a live agent spin up a video session, confirm the device is working, and walk the customer through setup or troubleshooting. And because a help center is a critical part of delivering great service, especially when scaling fast across multiple countries and languages, Zendesk built Knowledge Builder, an AI-powered tool that helps teams build and maintain their help center content automatically. It analyzes real customer conversations and turns them into localized help articles for trending issues.Giving IT leaders a strong edge When a company adopts that new product, it becomes critical to resolve issues fast, to ensure employee productivity stays strong. Available with early access in November, Zendesk's new employee service offering, IT Asset Management (ITAM), natively integrates service and asset data together into the Zendesk service desk to help IT move from reactive troubleshooting to proactive service. Now, when a vague “tablet not working” ticket comes in, Zendesk ITAM surfaces the device details right inside the ticket, so IT knows exactly what they are dealing with. Zendesk Copilot uses that same asset data to recommend model-specific troubleshooting steps. And with Knowledge Connectors, those steps can be pulled directly from SharePoint or Confluence without migration. If the fix does not work, the IT specialist confirms in seconds that the device is under warranty and issues a replacement without any back-and-forth. With real-time visibility across every hardware asset, the IT leader can spot patterns before they become a flood of tickets, or failures at the point of care, so IT resolves issues faster and prevents problems before they happen. "With Zendesk, IT is not just reacting to issues — it is setting the standard for how proactive employee service is delivered," Upadhyay said.For more on the latest Zendesk updates and improvements, and to watch a conversation with Zendesk's special guest, co-founder of LinkedIn, Reid Hoffman, and more, watch the full videos here. And for the latest updates, detailed information, and product availability, visit Zendesk’s official announcements page. Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.
Presented by CertiniaEvery professional services leader knows the feeling: a pipeline full of promising deals, but a bench that’s already stretched thin. That’s because growth has always been tied to a finite supply of consultants with finite availability to work on projects. Even with strong market demand, most firms only capture 10-20% of their potential pipeline because they simply can’t staff the work fast enough. Professional Services Automation (PSA) software emerged to help optimize operations, but the core model has remained the same.Thankfully, that limitation is about to change. The proliferation of AI agents is sparking a new model — Autonomous PSA — blending human expertise with a digital workforce, all managed by a central orchestration engine. The result is a system that allows firms to capture 70-90% of demand instead of leaving it on the table. Why professional services has the biggest transformation opportunity with agentic AIMany industries will be transformed by AI agents, but perhaps none more than professional services. Understanding why requires us to explore the difference between current-state automation and future-state autonomy.Traditional automation follows pre-set rules: When X happens, do Y. It's a logical workflow. Autonomy, on the other hand, is goal-oriented: The goal is Z. Analyze the data, select and deploy the best resources, and execute the necessary steps to achieve Z. It’s the difference between executing a workflow, and executing a full-on strategy.This distinction is key because the core operation of a professional services business is a complex strategy. Unlike a sales team managing a linear pipeline or a support team clearing a reactive queue, a services firm is constantly solving a multi-dimensional problem. The "product" isn't a license or a physical item; it's the expertise of its people, working on a diverse set of tasks, typically delivered over discrete units of time.That means the business model of a services organization contains layers of operational complexity that product-based businesses inherently get to avoid. The manual effort and guesswork involved often lead to conservative bidding on new business, underutilized experts, and reactive staffing that can put project margins and timelines at risk. Added up, this complexity represents a trillion-dollar opportunity cost for the global services economy.The orchestration engine that makes autonomous PSA possible“Autonomous PSA” describes an intelligent system designed to manage and orchestrate a blended team of human experts and their AI agent counterparts. It works by integrating a digital workforce of AI agents directly into your service delivery operations, providing a nearly limitless supply of labor for repeatable tasks, all governed by a single engine. It's a fundamental shift from a model constrained by human supply to one amplified by digital scale.There is one enterprise software ecosystem uniquely positioned to make Autonomous PSA possible: Salesforce. Autonomous PSA emerges from the combination of three of its core technologies:The Salesforce platform as the foundation: Everything will start with a single source of truth. The Salesforce platform provides the unified data fabric for every aspect of the customer relationship. This foundation extends across the entire platform, giving the autonomous engine the complete data context it needs to function. Agentforce as the AI engine: Agentforce represents the industry’s most secure, trusted layer for building and deploying AI agents that provide digital labor. It gives organizations the power to execute complex tasks at scale, transforming AI capabilities from concept to a tangible part of the future resource pool. Salesforce-native Professional Services Automation software as the orchestration brain: The data foundation and AI engine need a command center. A Salesforce-native solution for Professional Services Automation like Certinia acts as the orchestration brain that defines the goals, rules, and workflows for the agents, deploying them alongside human resources to optimize project outcomes from sale to delivery.The keystone of this new model is the orchestration brain, akin to a control tower for the hybrid human-AI agent workforce. It’s a system built to manage an elastic supply of resources, instantly scaling delivery by pairing consultants with digital agents. Instead of scrambling with spreadsheets, staffing becomes a real-time, AI-driven allocation based on skills, availability, and project needs.The combination creates a unified platform that gives the orchestration engine the context it needs for smarter, faster decision-making across the entire project lifecycle.For executives, the impact is direct. Now empowered to overcome human capacity limits, PSOs can expand pipeline capture from a mere 10–20% to as high as 70–90%. This growth is also more profitable, as margins improve when lower-value work is offloaded to digital labor, allowing people to focus on high-value delivery. Furthermore, project timelines are accelerated, with 24/7 AI capacity shortening schedules and speeding time-to-value. Crucially, this speed and efficiency do not come at the expense of quality; human oversight remains embedded in every engagement, ensuring client trust is maintained through strong governance.Preparing your organization for autonomous PSAAdapting to Autonomous Professional Services requires leadership and foresight. For organizations ready to start, the journey begins with three key steps:Re-architect your workforce model. The traditional pyramid workforce hierarchy is shifting to a diamond structure with AI agents handling the base of repeatable work. This will create new roles like orchestration analysts and agent supervisors to manage this blended workforce. Your first move is to audit your delivery processes and identify the high-volume, low-complexity tasks primed for this new digital workforce.Invest in a native orchestration engine. An autonomous system needs a central brain. This is your PSA solution, and it must be native to your CRM platform to access real-time data across sales, service, and finance. If your project, resource, and financial data live in different systems, your priority is to unify them on a single platform to create the foundation for intelligent decision-making.Experiment, then scale. Don't try to transform everything at once. Start by automating a single, high-friction process, like project creation from a closed-won opportunity or initial budget drafting. Proving value on a small scale builds the business case and the operational muscle for a systematic expansion across your entire services lifecycle.Model behind the trillion-dollar ppportunity Our analysis from over 2000 global professional services organizations indicates that firms today leave most of their pipeline untouched. With human capacity alone, they typically capture only 10–20% of qualified demand. By blending digital labor into the mix, that capacity can rise to 70–90%. The difference—what we call ΔR—is massive. For a large professional services organization (PSO) with a $6B pipeline, that shift alone unlocks about $3.6B in incremental revenue.And that is just the starting point. Once you add amplifiers like faster delivery (acceleration), lower delivery cost (margin gains), and access to niche expertise (skill-gap coverage), the impact multiplies. In our model, those amplifiers nearly triple the base gain, raising the total opportunity to $10 Billion per firm. Scale that across 100 of the world’s largest PSOs, and you arrive at the trillion-dollar prize.Seize the full market potentialThe idea presented here represents a once-in-a-generation opportunity to redefine the economics of professional services. Firms that adopt Autonomous PSA will capture a greater share of demand, deliver faster outcomes, and free their experts to focus on what matters most: client success.The era of Autonomous Professional Services has begun. The orchestration engine is the key. How quickly will your organization seize the opportunity?The full framework and analytical model are detailed in this new white paper, Unlocking a Trillion Dollar Opportunity for Professional Services with Autonomous PSA. I encourage you to download it and explore how your organization can prepare for this shift.Raju Malhotra is Chief Product & Technology Officer at Certinia.Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.
The friction of having to open a separate chat window to prompt an agent could be a hassle for many enterprises. And AI companies are seeing an opportunity to bring more and more AI services into one platform, even integrating into where employees do their work. OpenAI’s ChatGPT, although still a separate window, is gradually introducing more integrations into its platform. Rivals like Google and Amazon Web Services believe they can compete with new platforms directly aiming at enterprise users who just want a more streamlined AI experience. And these two new platforms are the latest volley in the race to bring enterprise AI users into one central place for their AI needs. Google and AWS are separately introducing new platforms designed for full-stack agent workflow, hoping to usher in a world where users don’t need to open other windows to access agents. Google unveiled Gemini Enterprise, a platform that Google Cloud CEO Thomas Kurian said “brings the best of Google AI to every employee.” Meanwhile, AWS announced Quick Suite, a series of services intended to exist as a browser extension for enterprises to call on agents. Both these platforms aim to keep enterprise employees working within one ecosystem, keeping the needed context in more local storage. Quick SightAWS, through Bedrock, allowed enterprises to build applications and agents, test these and then begin deployment in one space. However, Bedrock remains a backend tool. AWS is banking that organizations will want a better way to access those agents without having to leave their workspace. Quick Suite will be AWS’s front facing agentic application for enterprises. It will be a browser extension for Chrome and Firefox and accessible on Microsoft Outlook, Word and Slack. AWS vice president for Agentic AI Swami Sivasubramanian said Quick Suite is the company’s way of “entering a new era of work,” in that it gives employees access to AI applications they like with privacy considerations and context from their enterprise data. Quick Suite connects with Adobe Analytics, SharePoint, Snowflake, Google Drive, OneDrive, Outlook, Salesforce, Service Now, Slack, DataBricks, Amazon Redshift, and Amazon S3. Through MCP servers, users can also access information from Atlassian, Asana, Box, Canva, PagerDuty, Workato or Zapier. The platform consists of several services users can toggle to:An agent builder accessible through a chat assistant Quick Sight to analyze and visualize dataQuick Research which can find information and build out research reports. Users can choose to limit the search to internal or uploaded documents only or to access the internetQuick Flows to allow people to build routine tasks through simple promptsQuick Automate for more complicated workflows where the model will can begin coordinating agents and data sharing to complete tasksAWS said it orchestrates through several foundation models to power Quick Suite’s services. Gemini EnterpriseGoogle had already begun offering enterprise AI solutions, often in fragmented products. It’s newest offering, Gemini Enterprise, brings together the company’s AI offerings in a single place. Products like Gemini CLI and Google Vids will be integrated and accessible through Gemini Enterprise. “By bringing all of these components together through a single interface, Gemini Enterprise transforms how teams work,” Kurian said in a blog post.It is powered by Gemini models and connects to an enterprise’s data sources. Gemini always connected to Google’s Workspace services such as Docs and Drive, but Gemini Enterprise can now grab information from Microsoft 365 or other platforms like Salesforce. The idea behind Gemini Enterprise is to offer “a no-code workbench” for any user to surface information and orchestrate agents for automation. The platform includes pre-built agents for deep research and insights, but customers can bring in their own agents or other third-party agents. Administrators can manage these agents and workflows through a visual governance framework within Gemini Enterprise. Google said some customers have already begun using Gemini Enterprise including Macquarie Bank, legal AI provider Harvey and Banco BV. Google told VentureBeat that other platforms, like Vertex AI, remain separate products. Pricing for Gemini Enterprise, both the standard and pulse editions, start at $30 per seat per month. A new pricing tier, Gemini Business, costs $21/seat per month for a year. Uninterrupted work in one placeIn many ways, enterprise AI was always going to move to this more full-stack, end-to-end environment where people access all AI tools in one place. After all, fragmented offerings and lost context turn off many employees who already have a lot on their plate. Removing the friction of moving windows and possibly losing context to what you’re working could save people a lot more time, and make the idea of using an AI agent or chatbot more appealing. This was the reasoning behind OpenAI’s decision to create a desktop app for ChatGPT and why we see so many product announcements around integrations. But now, competitors have to offer more differentiated platforms or they risk being labled as copycats of products most people already use. I felt the same during a demo of Quick Suite, thinking it felt similar to ChatGPT. The battle to be the one full-stack platform for the enterprise is just beginning. And as more AI tools and agents become more useful for employees, there will be more demand to make calling up these services as simple as a tap from their preferred workspace.
TAAFT surpasses 2 MILLION, Sora soars, Altman's AI master plan, and more...
Check your research, MIT: 95% of AI projects aren’t failing — far from it.According to new data from G2, nearly 60% of companies already have AI agents in production, and fewer than 2% actually fail once deployed. That paints a very different picture from recent academic forecasts suggesting widespread AI project stagnation.As one of the world’s largest crowdsourced software review platforms, G2’s dataset reflects real-world adoption trends — which show that AI agents are proving far more durable and “sticky” than early generative AI pilots.“Our report’s really pointing out that agentic is a different beast when it comes to AI with respect to failure or success,” Tim Sanders, G2’s head of research, told VentureBeat. Handing off to AI in customer service, BI, software developmentSanders points out that the now oft-referenced MIT study, released in July, only considered gen AI custom projects, Sanders argues, and many media outlets generalized that to AI failing 95% of the time. He points out that university researchers analyzed public announcements, rather than closed-loop data. If companies didn’t announce a P&L impact, their projects were considered a failure — even if they really weren’t. G2’s 2025 AI Agents Insights Report, by contrast, surveyed more than 1,300 B2B decision-makers, finding that: 57% of companies have agents in production and 70% say agents are “core to operations”;83% of are satisfied with agent performance;Enterprises are now investing an average of $1 million-plus annually, with 1 in 4 spending $5 million-plus; 9 out of 10 plan to increase that investment over the next 12 months; Organizations have seen 40% cost savings, 23% faster workflows, and 1 in 3 report 50%-plus speed gains, particularly in marketing and saless;Nearly 90% of study participants reported higher employee satisfaction in departments where agents were deployed.The leading use cases for AI agents? Customer service, business intelligence (BI) and software development. Interestingly, G2 found a “surprising number” (about 1 in 3) of what Sanders calls ‘let it rip’ organizations. “They basically allowed the agent to do a task and then they would either roll it back immediately if it was a bad action, or do QA so that they could retract the bad actions very, very quickly,” he explained. At the same time, though, agent programs with a human in the loop were twice as likely to deliver cost savings — 75% or more — than fully autonomous agent strategies.This reflects what Sanders called a “dead heat” between ‘let it rip’ organizations and ‘leave some human gates’ organizations. “There's going to be a human in the loop years from now,” he said. “Over half of our respondents told us there's more human oversight than we expected.” However, nearly half of IT buyers are comfortable with granting agents full autonomy in low-risk workflows such as data remediation or data pipeline management. Meanwhile, think of BI and research as prep work, Sanders said; agents gather information in the background to prepare humans to make last passes and final decisions. A classic example of this is a mortgage loan, Sanders noted: Agents do everything right up until the human analyzes their findings and yay or nays the loan. If there are mistakes, they're in the background. “It just doesn't publish on your behalf and put your name on it,” said Sanders. “So as a result, you trust it more. You use it more.” When it comes to specific deployment methods, Salesforce's Agentforce “is winning” over ready-made agents and in-house builds, taking up 38% of all market share, Sanders reported. However, many organizations seem to be going hybrid with a goal to eventually stand up in-house tools. Then, because they want a trusted source of data, “they're going to crystallize around Microsoft, ServiceNow, Salesforce, companies with a real system of record,” he predicted. AI agents aren't deadline-drivenWhy are agents (in some instances at least) so much better than humans? Sanders pointed to a concept called Parkinson's Law, which states that ‘work expands so as to fill the time available for its completion.’“Individual productivity doesn't lead to organizational productivity because humans are only really driven by deadlines,” said Sanders. When organizations looked at gen AI projects, they didn’t move the goal posts; the deadlines didn’t change. “The only way that you fix that is to either move the goal post up or deal with non-humans, because non-humans aren't subject to Parkinson's Law,” he said, pointing out that they’re not afflicted with “the human procrastination syndrome.”Agents don't take breaks. They don't get distracted. “They just grind so you don't have to change the deadlines,” said Sanders. “If you focus on faster and faster QA cycles that may even be automated, you fix your agents faster than you fix your humans.” Start with business problems, understand that trust is a slow buildStill, Sanders sees AI following the cloud when it comes to trust: He remembers in 2007 when everyone was quick to deploy cloud tools; then by 2009 or 2010, “there was kind of a trough of trust.” Mix this in with security concerns: 39% of all respondents to G2’s survey said they’d experienced a security incident since deploying AI; 25% of the time, it was severe. Sanders emphasized that companies must think about measuring in milliseconds how quickly an agent can be retrained to never repeat a bad action again. Always include IT operations in AI deployments, he advised. They know what went wrong with gen AI and robotic process automation (RPA) and can get to the bottom of explainability, which leads to a lot more trust. On the flip side, though: Don't blindly trust vendors. In fact, only half of respondents said they did; Sanders noted that the No. 1 trust signal is agent explainability. “In qualitative interviews, we were told over and over again, if you [a vendor] can't explain it, you can't deploy it and manage it.” It’s also critical to begin with the business problem and work backwards, he advised: Don't buy agents, then look for a proof of concept. If leaders apply agents to the biggest pain points, internal users will be more forgiving when incidents occur, and more willing to iterate, therefore building up their skillsets. “People still don't trust the cloud, they definitely don't trust gen AI, they might not trust agents until they experience it, and then the game changes,” said Sanders. “Trust arrives on a mule — you don’t just get forgiveness.”
Researchers at the University of Illinois Urbana-Champaign and Google Cloud AI Research have developed a framework that enables large language model (LLM) agents to organize their experiences into a memory bank, helping them get better at complex tasks over time.The framework, called ReasoningBank, distills “generalizable reasoning strategies” from an agent’s successful and failed attempts to solve problems. The agent then uses this memory during inference to avoid repeating past mistakes and make better decisions as it faces new problems. The researchers show that when combined with test-time scaling techniques, where an agent makes multiple attempts at a problem, ReasoningBank significantly improves the performance and efficiency of LLM agents.Their findings show that ReasoningBank consistently outperforms classic memory mechanisms across web browsing and software engineering benchmarks, offering a practical path toward building more adaptive and reliable AI agents for enterprise applications.The challenge of LLM agent memoryAs LLM agents are deployed in applications that run for long periods, they encounter a continuous stream of tasks. One of the key limitations of current LLM agents is their failure to learn from this accumulated experience. By approaching each task in isolation, they inevitably repeat past mistakes, discard valuable insights from related problems, and fail to develop skills that would make them more capable over time.The solution to this limitation is to give agents some kind of memory. Previous efforts to give agents memory have focused on storing past interactions for reuse by organizing information in various forms from plain text to structured graphs. However, these approaches often fall short. Many use raw interaction logs or only store successful task examples. This means they can't distill higher-level, transferable reasoning patterns and, crucially, they don’t extract and use the valuable information from the agent’s failures. As the researchers note in their paper, “existing memory designs often remain limited to passive record-keeping rather than providing actionable, generalizable guidance for future decisions.”How ReasoningBank worksReasoningBank is a memory framework designed to overcome these limitations. Its central idea is to distill useful strategies and reasoning hints from past experiences into structured memory items that can be stored and reused.According to Jun Yan, a Research Scientist at Google and co-author of the paper, this marks a fundamental shift in how agents operate. "Traditional agents operate statically—each task is processed in isolation," Yan explained. "ReasoningBank changes this by turning every task experience (successful or failed) into structured, reusable reasoning memory. As a result, the agent doesn’t start from scratch with each customer; it recalls and adapts proven strategies from similar past cases."The framework processes both successful and failed experiences and turns them into a collection of useful strategies and preventive lessons. The agent judges success and failure through LLM-as-a-judge schemes to obviate the need for human labeling.Yan provides a practical example of this process in action. An agent tasked with finding Sony headphones might fail because its broad search query returns over 4,000 irrelevant products. "ReasoningBank will first try to figure out why this approach failed," Yan said. "It will then distill strategies such as ‘optimize search query’ and ‘confine products with category filtering.’ Those strategies will be extremely useful to get future similar tasks successfully done."The process operates in a closed loop. When an agent faces a new task, it uses an embedding-based search to retrieve relevant memories from ReasoningBank to guide its actions. These memories are inserted into the agent’s system prompt, providing context for its decision-making. Once the task is completed, the framework creates new memory items to extract insights from successes and failures. This new knowledge is then analyzed, distilled, and merged into the ReasoningBank, allowing the agent to continuously evolve and improve its capabilities.Supercharging memory with scalingThe researchers found a powerful synergy between memory and test-time scaling. Classic test-time scaling involves generating multiple independent answers to the same question, but the researchers argue that this “vanilla form is suboptimal because it does not leverage inherent contrastive signal that arises from redundant exploration on the same problem.”To address this, they propose Memory-aware Test-Time Scaling (MaTTS), which integrates scaling with ReasoningBank. MaTTS comes in two forms. In “parallel scaling,” the system generates multiple trajectories for the same query, then compares and contrasts them to identify consistent reasoning patterns. In sequential scaling, the agent iteratively refines its reasoning within a single attempt, with the intermediate notes and corrections also serving as valuable memory signals.This creates a virtuous cycle: the existing memory in ReasoningBank steers the agent toward more promising solutions, while the diverse experiences generated through scaling enable the agent to create higher-quality memories to store in ReasoningBank. “This positive feedback loop positions memory-driven experience scaling as a new scaling dimension for agents,” the researchers write.ReasoningBank in actionThe researchers tested their framework on WebArena (web browsing) and SWE-Bench-Verified (software engineering) benchmarks, using models like Google’s Gemini 2.5 Pro and Anthropic’s Claude 3.7 Sonnet. They compared ReasoningBank against baselines including memory-free agents and agents using trajectory-based or workflow-based memory frameworks.The results show that ReasoningBank consistently outperforms these baselines across all datasets and LLM backbones. On WebArena, it improved the overall success rate by up to 8.3 percentage points compared to a memory-free agent. It also generalized better on more difficult, cross-domain tasks, while reducing the number of interaction steps needed to complete tasks. When combined with MaTTS, both parallel and sequential scaling further boosted performance, consistently outperforming standard test-time scaling.This efficiency gain has a direct impact on operational costs. Yan points to a case where a memory-free agent took eight trial-and-error steps just to find the right product filter on a website. "Those trial and error costs could be avoided by leveraging relevant insights from ReasoningBank," he noted. "In this case, we save almost twice the operational costs," which also improves the user experience by resolving issues faster.For enterprises, ReasoningBank can help develop cost-effective agents that can learn from experience and adapt over time in complex workflows and areas like software development, customer support, and data analysis. As the paper concludes, “Our findings suggest a practical pathway toward building adaptive and lifelong-learning agents.”Yan confirmed that their findings point toward a future of truly compositional intelligence. For example, a coding agent could learn discrete skills like API integration and database management from separate tasks. "Over time, these modular skills... become building blocks the agent can flexibly recombine to solve more complex tasks," he said, suggesting a future where agents can autonomously assemble their knowledge to manage entire workflows with minimal human oversight.
Many organizations would be hesitant to overhaul their tech stack and start from scratch.
Not Notion.
For the 3.0 version of its productivity software (released in September), the company didn’t hesitate to rebuild from the ground up; they recognized that it was necessary, in fact, to support agentic AI at enterprise scale.
Whereas traditional AI-powered workflows involve explicit, step-by-step instructions based on few-shot learning, AI agents powered by advanced reasoning models are thoughtful about tool definition, can identify and comprehend what tools they have at their disposal and plan next steps.
“Rather than trying to retrofit into what we were building, we wanted to play to the strengths of reasoning models,” Sarah Sachs, Notion’s head of AI modeling, told VentureBeat. “We've rebuilt a new architecture because workflows are different from agents.”Re-orchestrating so models can work autonomouslyNotion has been adopted by 94% of Forbes AI 50 companies, has 100 million total users and counts among its customers OpenAI, Cursor, Figma, Ramp and Vercel.
In a rapidly evolving AI landscape, the company identified the need to move beyond simpler, task-based workflows to goal-oriented reasoning systems that allow agents to autonomously select, orchestrate, and execute tools across connected environments. Very quickly, reasoning models have become “far better” at learning to use tools and follow chain-of-thought (CoT) instructions, Sachs noted. This allows them to be “far more independent” and make multiple decisions within one agentic workflow. “We rebuilt our AI system to play to that," she said.
From an engineering perspective, this meant replacing rigid prompt-based flows with a unified orchestration model, Sachs explained. This core model is supported by modular sub-agents that search Notion and the web, query and add to databases and edit content.
Each agent uses tools contextually; for instance, they can decide whether to search Notion itself, or another platform like Slack. The model will perform successive searches until the relevant information is found. It can then, for instance, convert notes into proposals, create follow-up messages, track tasks, and spot and make updates in knowledge bases.
In Notion 2.0, the team focused on having AI perform specific tasks, which required them to “think exhaustively” about how to prompt the model, Sachs noted. However, with version 3.0, users can assign tasks to agents, and agents can actually take action and perform multiple tasks concurrently.
“We reorchestrated it to be self-selecting on the tools, rather than few-shotting, which is explicitly prompting how to go through all these different scenarios,” Sachs explained. The aim is to ensure everything interfaces with AI and that “anything you can do, your Notion agent can do.”Bifurcating to isolate hallucinationsNotion’s philosophy of “better, faster, cheaper,” drives a continuous iteration cycle that balances latency and accuracy through fine-tuned vector embeddings and elastic search optimization. Sachs’ team employs a rigorous evaluation framework that combines deterministic tests, vernacular optimization, human-annotated data and LLMs-as-a-judge, with model-based scoring identifying discrepancies and inaccuracies.
“By bifurcating the evaluation, we're able to identify where the problems come from, and that helps us isolate unnecessary hallucinations,” Sachs explained. Further, making the architecture itself simpler means it’s easier to make changes as models and techniques evolve.
“We optimize latency and parallel thinking as much as possible,” which leads to “way better accuracy,” Sachs noted. Models are grounded in data from the web and the Notion connected workspace.
Ultimately, Sachs reported, the investment in rebuilding its architecture has already provided Notion returns in terms of capability and faster rate of change.
She added, “We are fully open to rebuilding it again, when the next breakthrough happens, if we have to.”Understanding contextual latencyWhen building and fine-tuning models, it’s important to understand that latency is subjective: AI must provide the most relevant information, not necessarily the most, at the cost of speed.
“You'd be surprised at the different ways customers are willing to wait for things and not wait for things,” Sachs said. It makes for an interesting experiment: How slow can you go before people abandon the model?
With pure navigational search, for instance, users may not be as patient; they want answers near-immediately. “If you ask, ‘What's two plus two,’ you don't want to wait for your agent to be searching everywhere in Slack and JIRA,” Sachs pointed out.
But the longer the time it's given, the more exhaustive a reasoning agent can be. For instance, Notion can perform 20 minutes of autonomous work across hundreds of websites, files and other materials. In these instances, users are more willing to wait, Sachs explained; they allow the model to execute in the background while they attend to other tasks.
“It's a product question,” said Sachs. “How do we set user expectations from the UI? How do we ascertain user expectations on latency?”Notion is its biggest userNotion understands the importance of using its own product — in fact, its employees are among its biggest power users.
Sachs explained that teams have active sandboxes that generate training and evaluation data, as well as a “really active” thumbs-up-thumbs-down user feedback loop. Users aren’t shy about saying what they think should be improved or features they’d like to see.
Sachs emphasized that when a user thumbs down an interaction, they are explicitly giving permission to a human annotator to analyze that interaction in a way that de-anonymizes them as much as possible.
“We are using our own tool as a company all day, every day, and so we get really fast feedback loops,” said Sachs. “We’re really dogfooding our own product.”
That said, it’s their own product they’re building, Sachs noted, so they understand that they may have goggles on when it comes to quality and functionality. To balance this out, Notion has trusted "very AI-savvy" design partners who are granted early access to new capabilities and provide important feedback.
Sachs emphasized that this is just as important as internal prototyping.
“We're all about experimenting in the open, I think you get much richer feedback,” said Sachs. “Because at the end of the day, if we just look at how Notion uses Notion, we're not really giving the best experience to our customers.”
Just as importantly, continuous internal testing allows teams to evaluate progressions and make sure models aren't regressing (when accuracy and performance degrades over time). "Everything you're doing stays faithful," Sachs explained. "You know that your latency is within bounds." Many companies make the mistake of focusing too intensely on retroactively-focused evans; this makes it difficult for them to understand how or where they're improving, Sachs pointed out. Notion considers evals as a "litmus test" of development and forward-looking progression and evals of observability and regression proofing.
“I think a big mistake a lot of companies make is conflating the two,” said Sachs. “We use them for both purposes; we think about them really differently.”Takeaways from Notion's journeyFor enterprises, Notion can serve as a blueprint for how to responsibly and dynamically operationalize agentic AI in a connected, permissioned enterprise workspace.
Sach’s takeaways for other tech leaders: Don’t be afraid to rebuild when foundational capabilities change; Notion fully re-engineered its architecture to align with reasoning-based models.Treat latency as contextual: Optimize per use case, rather than universally. Ground all outputs in trustworthy, curated enterprise data to ensure accuracy and trust.
She advised: “Be willing to make the hard decisions. Be willing to sit at the top of the frontier, so to speak, on what you're developing to build the best product you can for your customers.”
A scavenger hunt campaign to promote Taylor Swift’s new album The Life of a Showgirl resulted in a viral #SwiftiesAgainstAI campaign.
In a packed theater at Fort Mason, after a whirlwind keynote of product announcements, OpenAI CEO Sam Altman sat down with Sir Jony Ive, the legendary designer behind Apple's most iconic products. The conversation, held exclusively for the 1,500 developers in attendance and not part of the public livestream, offered the clearest glimpse yet into the philosophy and ambition behind their secretive collaboration to build a new "family" of AI-powered devices.The partnership, solidified by OpenAI's staggering $6.5 billion acquisition of Ive's hardware startup Io in May, has been the subject of intense speculation.While concrete product details remained under wraps, the discussion pivoted away from specifications and toward a profound, almost therapeutic mission: to fix our broken relationship with technology.For nearly 45 minutes, Ive, in his signature thoughtful cadence, articulated a vision that feels like both a continuation of and a repentance for his life's work. The man who designed the iPhone, a device that arguably defined the modern era of personal computing, is now on a quest to cure the very anxieties it helped create.Jony Ive's post-Apple mission, clarified by ChatGPTThe collaboration, Ive explained, was years in the making, but it was the launch of ChatGPT that provided a sudden, clarifying purpose for his post-Apple design collective, LoveFrom."With the launch of ChatGPT, it felt like our purpose for the last six years became clear," Ive said. "We were starting to develop some ideas for an interface based on the capability of the technology these guys were developing... I've never in my career come across anything vaguely like the affordance, like the capability that we're now starting to sense."This capability, he argued, demands a fundamental rethinking of the devices we use, which he described as "legacy products" from a bygone era. The core motivation, he stressed, is not about corporate agendas but about a sense of duty to humanity."The reason we're doing this is we love our species and we want to be useful," Ive said. "We think that humanity deserves much better than humanity generally is given."An 'obscene understatement': Jony Ive's quest to cure our tech anxietyThe most striking theme of the conversation was Ive's candid critique of the current state of technology — the very ecosystem he was instrumental in building. He described our current dynamic with our devices as deeply flawed, a problem he now sees AI as the solution to, not an extension of."I don't think we have an easy relationship with our technology at the moment," Ive began, before adding, "When I said we have an uncomfortable relationship with our technology, I mean, that's the most obscene understatement."Instead of chasing productivity, the primary goal for this new family of devices is emotional well-being. It's a radical departure from the efficiency-obsessed ethos that dominates Silicon Valley.When asked about his ambitions for the new devices, Ive prioritized emotional well-being over simple productivity. "I know I should care about productivity, and I do," he said, but his ultimate goal is that the tools "make us happy and fulfilled, and more peaceful and less anxious, and less disconnected."He framed it as a chance to reject the current, fraught relationship people have with their technology. "We have a chance to... absolutely change the situation that we find ourselves in," he stated. "We don't accept this has to be the norm."Buried in brilliance: why '15 to 20 compelling ideas' have become Ive's biggest challengeWhile the vision is clear, the path is fraught with challenges. Reports have surfaced about technical hurdles and philosophical debates delaying the project. Ive himself gave voice to this struggle, admitting the sheer pace of AI's progress has been overwhelming. The rapid advancement has generated a torrent of possibilities, making the crucial act of focusing incredibly difficult."The momentum is so extraordinary... it has led us to generate 15 to 20 really compelling product ideas. And the challenge is trying to focus," Ive confessed."I used to be good at that, and I've lost some confidence, because the choices are, it'll be easy if you really knew there were three good ones... it's just not like that."This admission provides context to reports that the team is grappling with unresolved issues around the device's "personality" and computing infrastructure. The goal, according to one source, is to create an AI companion that is "accessible but not intrusive," avoiding the pitfalls of a "weird AI girlfriend."Beyond the screen: Ive's design philosophy for an 'inevitable' AI deviceWhile no devices were shown, the conversation and prior reports offer clues. The project involves a "family of devices," not a single gadget.It will likely be a departure from the screen-centric world we inhabit. Reports suggest a "palm-sized device without a screen" that relies on cameras and microphones to perceive its environment.Ive argued that it would be "absurd" to assume that today's breathtaking AI technology should be delivered through "products that are decades old." The goal is to create something that feels entirely new, yet completely natural."It should seem inevitable. It should seem obvious, as if there wasn't possibly another rational solution to the problem," Ive said, echoing a design philosophy often attributed to his time with Steve Jobs.He also spoke of bringing a sense of joy and whimsy back to technology, pushing back against a culture he feels has become overly serious."In terms of the interfaces we design, if we can't smile honestly, if it's just another deeply serious sort of exclusive thing, I think that would do us all a huge disservice," he remarked.The chat concluded without a product reveal, leaving the audience with a philosophical blueprint rather than a technical one. The central narrative is clear: Jony Ive, the designer who put a screen in every pocket, is now betting on a screenless future, powered by OpenAI's formidable intelligence, to make us all a little less anxious and a little more human.
The trend of AI researchers developing new, small open source generative models that outperform far larger, proprietary peers continued this week with yet another staggering advancement.Alexia Jolicoeur-Martineau, Senior AI Researcher at Samsung's Advanced Institute of Technology (SAIT) in Montreal, Canada, has introduced the Tiny Recursion Model (TRM) — a neural network so small it contains just 7 million parameters (internal model settings), yet it competes with or surpasses cutting-edge language models 10,000 times larger in terms of their parameter count, including OpenAI's o3-mini and Google's Gemini 2.5 Pro, on some of the toughest reasoning benchmarks in AI research. The goal is to show that very highly performant new AI models can be created affordably without massive investments in the graphics processing units (GPUs) and power needed to train the larger, multi-trillion parameter flagship models powering many LLM chatbots today. The results were described in a research paper published on open access website arxiv.org, entitled "Less is More: Recursive Reasoning with Tiny Networks.""The idea that one must rely on massive foundational models trained for millions of dollars by some big corporation in order to solve hard tasks is a trap," wrote Jolicoeur-Martineau on the social network X. "Currently, there is too much focus on exploiting LLMs rather than devising and expanding new lines of direction."Jolicoeur-Martineau also added: "With recursive reasoning, it turns out that 'less is more'. A tiny model pretrained from scratch, recursing on itself and updating its answers over time, can achieve a lot without breaking the bank."TRM's code is available now on Github under an enterprise-friendly, commercially viable MIT License — meaning anyone from researchers to companies can take, modify it, and deploy it for their own purposes, even commercial applications.One Big CaveatHowever, readers should be aware that TRM was designed specifically to perform well on structured, visual, grid-based problems like Sudoku, mazes, and puzzles on the ARC (Abstract and Reasoning Corpus)-AGI benchmark, the latter which offers tasks that should be easy for humans but difficult for AI models, such sorting colors on a grid based on a prior, but not identical, solution. From Hierarchy to SimplicityThe TRM architecture represents a radical simplification. It builds upon a technique called Hierarchical Reasoning Model (HRM) introduced earlier this year, which showed that small networks could tackle logical puzzles like Sudoku and mazes. HRM relied on two cooperating networks—one operating at high frequency, the other at low—supported by biologically inspired arguments and mathematical justifications involving fixed-point theorems. Jolicoeur-Martineau found this unnecessarily complicated.TRM strips these elements away. Instead of two networks, it uses a single two-layer model that recursively refines its own predictions. The model begins with an embedded question and an initial answer, represented by variables x, y, and z. Through a series of reasoning steps, it updates its internal latent representation z and refines the answer y until it converges on a stable output. Each iteration corrects potential errors from the previous step, yielding a self-improving reasoning process without extra hierarchy or mathematical overhead.How Recursion Replaces ScaleThe core idea behind TRM is that recursion can substitute for depth and size.By iteratively reasoning over its own output, the network effectively simulates a much deeper architecture without the associated memory or computational cost. This recursive cycle, run over as many as sixteen supervision steps, allows the model to make progressively better predictions — similar in spirit to how large language models use multi-step “chain-of-thought” reasoning, but achieved here with a compact, feed-forward design.The simplicity pays off in both efficiency and generalization. The model uses fewer layers, no fixed-point approximations, and no dual-network hierarchy. A lightweight halting mechanism decides when to stop refining, preventing wasted computation while maintaining accuracy.Performance That Punches Above Its WeightDespite its small footprint, TRM delivers benchmark results that rival or exceed models millions of times larger. In testing, the model achieved:87.4% accuracy on Sudoku-Extreme (up from 55% for HRM)85% accuracy on Maze-Hard puzzles45% accuracy on ARC-AGI-18% accuracy on ARC-AGI-2These results surpass or closely match performance from several high-end large language models, including DeepSeek R1, Gemini 2.5 Pro, and o3-mini, despite TRM using less than 0.01% of their parameters.Such results suggest that recursive reasoning, not scale, may be the key to handling abstract and combinatorial reasoning problems — domains where even top-tier generative models often stumble.Design Philosophy: Less Is MoreTRM’s success stems from deliberate minimalism. Jolicoeur-Martineau found that reducing complexity led to better generalization. When the researcher increased layer count or model size, performance declined due to overfitting on small datasets. By contrast, the two-layer structure, combined with recursive depth and deep supervision, achieved optimal results.The model also performed better when self-attention was replaced with a simpler multilayer perceptron on tasks with small, fixed contexts like Sudoku. For larger grids, such as ARC puzzles, self-attention remained valuable. These findings underline that model architecture should match data structure and scale rather than default to maximal capacity.Training Small, Thinking BigTRM is now officially available as open source under an MIT license on GitHub.The repository includes full training and evaluation scripts, dataset builders for Sudoku, Maze, and ARC-AGI, and reference configurations for reproducing the published results. It also documents compute requirements ranging from a single NVIDIA L40S GPU for Sudoku training to multi-GPU H100 setups for ARC-AGI experiments.The open release confirms that TRM is designed specifically for structured, grid-based reasoning tasks rather than general-purpose language modeling. Each benchmark — Sudoku-Extreme, Maze-Hard, and ARC-AGI — uses small, well-defined input–output grids, aligning with the model’s recursive supervision process. Training involves substantial data augmentation (such as color permutations and geometric transformations), underscoring that TRM’s efficiency lies in its parameter size rather than total compute demand.The model’s simplicity and transparency make it more accessible to researchers outside of large corporate labs. Its codebase builds directly on the earlier Hierarchical Reasoning Model framework but removes HRM’s biological analogies, multiple network hierarchies, and fixed-point dependencies. In doing so, TRM offers a reproducible baseline for exploring recursive reasoning in small models — a counterpoint to the dominant “scale is all you need” philosophy.Community ReactionThe release of TRM and its open-source codebase prompted an immediate debate among AI researchers and practitioners on X. While many praised the achievement, others questioned how broadly its methods could generalize.Supporters hailed TRM as proof that small models can outperform giants, calling it “10,000× smaller yet smarter” and a potential step toward architectures that think rather than merely scale. Critics countered that TRM’s domain is narrow — focused on bounded, grid-based puzzles — and that its compute savings come mainly from size, not total runtime. Researcher Yunmin Cha noted that TRM’s training depends on heavy augmentation and recursive passes, “more compute, same model.” Cancer geneticist and data scientist Chey Loveday stressed that TRM is a solver, not a chat model or text generator: it excels at structured reasoning but not open-ended language.Machine learning researcher Sebastian Raschka positioned TRM as an important simplification of HRM rather than a new form of general intelligence. He described its process as “a two-step loop that updates an internal reasoning state, then refines the answer.”Several researchers, including Augustin Nabele, agreed that the model’s strength lies in its clear reasoning structure but noted that future work would need to show transfer to less-constrained problem types.The consensus emerging online is that TRM may be narrow, but its message is broad: careful recursion, not constant expansion, could drive the next wave of reasoning research.Looking AheadWhile TRM currently applies to supervised reasoning tasks, its recursive framework opens several future directions. Jolicoeur-Martineau has suggested exploring generative or multi-answer variants, where the model could produce multiple possible solutions rather than a single deterministic one. Another open question involves scaling laws for recursion — determining how far the “less is more” principle can extend as model complexity or data size grows.Ultimately, the study offers both a practical tool and a conceptual reminder: progress in AI need not depend on ever-larger models. Sometimes, teaching a small network to think carefully — and recursively — can be more powerful than making a large one think once.
With the US falling behind on open source models, one startup has a bold idea for democratizing AI: let anyone run reinforcement learning.
We’re announcing a partnership with LA28, Team USA, and NBCUniversal ahead of the Winter and Summer Olympic and Paralympic Games.
Google is investing an additional €5 billion in Belgium over the next two years to expand its cloud and AI infrastructure. This includes expansions of our data center ca…
In this post, we show how Vxceed used Amazon Bedrock to develop this AI-powered multi-agent solution that generates personalized sales pitches for field sales teams at scale.
Machine learning operations (MLOps) is the combination of people, processes, and technology to productionize ML use cases efficiently. To achieve this, enterprise customers must develop MLOps platforms to support reproducibility, robustness, and end-to-end observability of the ML use case’s lifecycle. Those platforms are based on a multi-account setup by adopting strict security constraints, development best […]
The MIT–MBZUAI Collaborative Research Program will unite faculty and students from both institutions to advance AI and accelerate its use in pressing scientific and societal challenges.