Latest AI News & Updates

#ai #automation #security

Microsoft is fundamentally reimagining how people interact with their computers, announcing Thursday a sweeping transformation of Windows 11 that brings voice-activated AI assistants, autonomous software agents, and contextual intelligence to every PC running the operating system — not just premium devices with specialized chips.The announcement represents Microsoft's most aggressive push yet to integrate generative artificial intelligence into the desktop computing experience, moving beyond the chatbot interfaces that have defined the first wave of consumer AI products toward a more ambient, conversational model where users can simply talk to their computers and have AI agents complete complex tasks on their behalf."When we think about what the promise of an AI PC is, it should be capable of three things," Yusuf Mehdi, Microsoft's Executive Vice President and Consumer Chief Marketing Officer, told reporters at a press conference last week. "First, you should be able to interact with it naturally, in text or voice, and have it understand you. Second, it should be able to see what you see and be able to offer guided support. And third, it should be able to take action on your behalf."The shift could prove consequential for an industry searching for the "killer app" for generative AI. While hundreds of millions of people have experimented with ChatGPT and similar chatbots, integrating AI directly into the operating system that powers the vast majority of workplace computers could dramatically accelerate mainstream adoption — or create new security and privacy headaches for organizations already struggling to govern employee use of AI tools.How 'Hey Copilot' aims to replace typing with talking on Windows PCsAt the heart of Microsoft's vision is voice interaction, which the company is positioning as the third fundamental input method for PCs after the mouse and keyboard — a comparison that underscores Microsoft's ambitions for reshaping human-computer interaction nearly four decades after the graphical user interface became standard.Starting this week, any Windows 11 user can enable the "Hey Copilot" wake word with a single click, allowing them to summon Microsoft's AI assistant by voice from anywhere in the operating system. The feature, which had been in limited testing, is now being rolled out to hundreds of millions of devices globally."It's been almost four decades since the PC has changed the way you interact with it, which is primarily mouse and keyboard," Mehdi said. "When you think about it, we find that people type on a given day up to 14,000 words on their keyboard, which is really kind of mind-boggling. But what if now you can go beyond that and talk to it?"The emphasis on voice reflects internal Microsoft data showing that users engage with Copilot twice as much when using voice compared to text input — a finding the company attributes to the lower cognitive barrier of speaking versus crafting precise written prompts."The magic unlock with Copilot Voice and Copilot Vision is the ease of interaction," according to the company's announcement. "Using the new wake word, 'Hey Copilot,' getting something done is as easy as just asking for it."But Microsoft's bet on voice computing faces real-world constraints that Mehdi acknowledged during the briefing. When asked whether workers in shared office environments would use voice features, potentially compromising privacy, Mehdi noted that millions already conduct voice calls through their PCs with headphones, and predicted users would adapt: "Just like when the mouse came out, people have to figure out when to use it, what's the right way, how to make it happen."Crucially, Microsoft is hedging its voice-first strategy by making all features accessible through traditional text input as well, recognizing that voice isn't always appropriate or accessible.AI that sees your screen: Copilot Vision expands worldwide with new capabilitiesPerhaps more transformative than voice control is the expansion of Copilot Vision, a feature Microsoft introduced earlier this year that allows the AI to analyze what's displayed on a user's screen and provide contextual assistance.Previously limited to voice interaction, Copilot Vision is now rolling out worldwide with a new text-based interface, allowing users to type questions about what they're viewing rather than speaking them aloud. The feature can now access full document context in Microsoft Office applications — meaning it can analyze an entire PowerPoint presentation or Excel spreadsheet without the user needing to scroll through every page."With 68 percent of consumers reporting using AI to support their decision making, voice is making this easier," Microsoft explained in its announcement. "The magic unlock with Copilot Voice and Copilot Vision is the ease of interaction."During the press briefing, Microsoft demonstrated Copilot Vision helping users navigate Spotify's settings to enable lossless audio streaming, coaching an artist through writing a professional bio based on their visual portfolio, and providing shopping recommendations based on products visible in YouTube videos."What brings AI to life is when you can give it rich context, when you can type great prompts," Mehdi explained. "The big challenge for the majority of people is we've been trained with search to do the opposite. We've been trained to essentially type in fewer keywords, because it turns out the less keywords you type on search, the better your answers are."He noted that average search queries remain just 2.3 keywords, while AI systems perform better with detailed prompts — creating a disconnect between user habits and AI capabilities. Copilot Vision aims to bridge that gap by automatically gathering visual context."With Copilot Vision, you can simply share your screen and Copilot in literally milliseconds can understand everything on the screen and then provide intelligence," Mehdi said.The vision capabilities work with any application without requiring developers to build specific integrations, using computer vision to interpret on-screen content — a powerful capability that also raises questions about what the AI can access and when.Software robots take control: Inside Copilot Actions' controversial autonomyThe most ambitious—and potentially controversial—new capability is Copilot Actions, an experimental feature that allows AI to take control of a user's computer to complete tasks autonomously.Coming first to Windows Insiders enrolled in Copilot Labs, the feature builds on Microsoft's May announcement of Copilot Actions on the web, extending the capability to manipulate local files and applications on Windows PCs.During demonstrations, Microsoft showed the AI agent organizing photo libraries, extracting data from documents, and working through multi-step tasks while users attended to other work. The agent operates in a separate, sandboxed environment and provides running commentary on its actions, with users able to take control at any time."As a general-purpose agent — simply describe the task you want to complete in your own words, and the agent will attempt to complete it by interacting with desktop and web applications," according to the announcement. "While this is happening, you can choose to focus on other tasks. At any time, you can take over the task or check in on the progress of the action, including reviewing what actions have been taken."Navjot Virk, Microsoft's Windows Experience Leader, acknowledged the technology's current limitations during the briefing. "We'll be starting with a narrow set of use cases while we optimize model performance and learn," Virk said. "You may see the agent make mistakes or encounter challenges with complex interfaces, which is why real-world testing of this experience is so critical."The experimental nature of Copilot Actions reflects broader industry challenges with agentic AI — systems that can take actions rather than simply providing information. While the potential productivity gains are substantial, AI systems still occasionally "hallucinate" incorrect information and can be vulnerable to novel attacks.Can AI agents be trusted? Microsoft's new security framework explainedRecognizing the security implications of giving AI control over users' computers and files, Microsoft introduced a new security framework built on four core principles: user control, operational transparency, limited privileges, and privacy-preserving design.Central to this approach is the concept of "agent accounts" — separate Windows user accounts under which AI agents operate, distinct from the human user's account. Combined with a new "agent workspace" that provides a sandboxed desktop environment, the architecture aims to create clear boundaries around what agents can access and modify.Peter Waxman, Microsoft's Windows Security Engineering Leader, emphasized that Copilot Actions is disabled by default and requires explicit user opt-in. "You're always in control of what Copilot Actions can do," Waxman said. "Copilot Actions is turned off by default and you're able to pause, take control, or disable it at any time."During operation, users can monitor the agent's progress in real-time, and the system requests additional approval before taking "sensitive or important" actions. All agent activity occurs under the dedicated agent account, creating an audit trail that distinguishes AI actions from human ones.However, the agent will have default access to users' Documents, Downloads, Desktop, and Pictures folders—a broad permission grant that could concern enterprise IT administrators.Dana Huang, Corporate Vice President for Windows Security, acknowledged in a blog post that "agentic AI applications introduce novel security risks, such as cross-prompt injection (XPIA), where malicious content embedded in UI elements or documents can override agent instructions, leading to unintended actions like data exfiltration or malware installation."Microsoft promises more details about enterprise controls at its Ignite conference in November.Gaming, taskbar redesign, and deeper Office integration round out updatesBeyond voice and autonomous agents, Microsoft introduced changes across Windows 11's core interfaces and extended AI to new domains.A new "Ask Copilot" feature integrates AI directly into the Windows taskbar, providing one-click access to start conversations, activate vision capabilities, or search for files and settings with "lightning-fast" results. The opt-in feature doesn't replace traditional Windows search.File Explorer gains AI capabilities through integration with third-party services. A partnership with Manus AI allows users to right-click on local image files and generate complete websites without manual uploading or coding. Integration with Filmora enables quick jumps into video editing workflows.Microsoft also introduced Copilot Connectors, allowing users to link cloud services like OneDrive, Outlook, Google Drive, Gmail, and Google Calendar directly to Copilot on Windows. Once connected, users can query personal content across platforms using natural language.In a notable expansion beyond productivity, Microsoft and Xbox introduced Gaming Copilot for the ROG Xbox Ally handheld gaming devices developed with ASUS. The feature, accessible via a dedicated hardware button, provides an AI assistant that can answer gameplay questions, offer strategic advice, and help navigate game interfaces through natural voice conversation.Why Microsoft is racing to embed AI everywhere before Apple and GoogleMicrosoft's announcement comes as technology giants race to embed generative AI into their core products following the November 2022 launch of ChatGPT. While Microsoft moved quickly to integrate OpenAI's technology into Bing search and introduce Copilot across its product line, the company has faced questions about whether AI features are driving meaningful engagement. Recent data shows Bing's search market share remaining largely flat despite AI integration.The Windows integration represents a different approach: rather than charging separately for AI features, Microsoft is building them into the operating system itself, betting that embedded AI will drive Windows 11 adoption and competitive differentiation against Apple and Google.Apple has taken a more cautious approach with Apple Intelligence, introducing AI features gradually and emphasizing privacy through on-device processing. Google has integrated AI across its services but has faced challenges with accuracy and reliability.Crucially, while Microsoft highlighted new Copilot+ PC models from partners with prices ranging from $649.99 to $1,499.99, the core AI features announced today work on any Windows 11 PC — a significant departure from earlier positioning that suggested AI capabilities required new hardware with specialized neural processing units."Everything we showed you here is for all Windows 11 PCs. You don't need to run it on a copilot plus PC. It works on any Windows 11 PC," Mehdi clarified.This democratization of AI features across the Windows 11 installed base potentially accelerates adoption but also complicates Microsoft's hardware sales pitch for premium devices.What Microsoft's AI bet means for the future of computingMehdi framed the announcement in sweeping terms, describing Microsoft's goal as fundamentally reimagining the operating system for the AI era."We're taking kind of a bold view of it. We really feel that the vision that we have is, let's rewrite the entire operating system around AI and build essentially what becomes truly the AI PC," he said.For Microsoft, the success of AI-powered Windows 11 could help drive the company's next phase of growth as PC sales have matured and cloud growth faces increased competition.For users and organizations, the announcement represents a potential inflection point in how humans interact with computers — one that could significantly boost productivity if executed well, or create new security headaches if the AI proves unreliable or difficult to control.The technology industry will be watching closely to see whether Microsoft's bet on conversational computing and agentic AI marks the beginning of a genuine paradigm shift, or proves to be another ambitious interface reimagining that fails to gain mainstream traction.What's clear is that Microsoft is moving aggressively to stake its claim as the leader in AI-powered personal computing, leveraging its dominant position in desktop operating systems to bring generative AI directly into the daily workflows of potentially a billion users.Copilot Voice and Vision are available today to Windows 11 users worldwide, with experimental capabilities coming to Windows Insiders in the coming weeks.

We’re partnering with Commonwealth Fusion Systems (CFS) to bring clean, safe, limitless fusion energy closer to reality.

#artificial intelligence #beginner #machine learning

Databricks is one of the leading platforms for building and executing machine learning notebooks at scale. It combines Apache Spark capabilities with a notebook-preferring interface, experiment tracking, and integrated data tooling. Here in this article, I’ll guide you through the process of hosting your ML notebook in Databricks step by step. Databricks offers several plans, but […]
The post How to Run your ML Notebook on Databricks? appeared first on Analytics Vidhya.

#ai

A new framework from Stanford University and SambaNova addresses a critical challenge in building robust AI agents: context engineering. Called Agentic Context Engineering (ACE), the framework automatically populates and modifies the context window of large language model (LLM) applications by treating it as an “evolving playbook” that creates and refines strategies as the agent gains experience in its environment.ACE is designed to overcome key limitations of other context-engineering frameworks, preventing the model’s context from degrading as it accumulates more information. Experiments show that ACE works for both optimizing system prompts and managing an agent's memory, outperforming other methods while also being significantly more efficient.The challenge of context engineeringAdvanced AI applications that use LLMs largely rely on "context adaptation," or context engineering, to guide their behavior. Instead of the costly process of retraining or fine-tuning the model, developers use the LLM’s in-context learning abilities to guide its behavior by modifying the input prompts with specific instructions, reasoning steps, or domain-specific knowledge. This additional information is usually obtained as the agent interacts with its environment and gathers new data and experience. The key goal of context engineering is to organize this new information in a way that improves the model’s performance and avoids confusing it. This approach is becoming a central paradigm for building capable, scalable, and self-improving AI systems.Context engineering has several advantages for enterprise applications. Contexts are interpretable for both users and developers, can be updated with new knowledge at runtime, and can be shared across different models. Context engineering also benefits from ongoing hardware and software advances, such as the growing context windows of LLMs and efficient inference techniques like prompt and context caching.There are various automated context-engineering techniques, but most of them face two key limitations. The first is a “brevity bias,” where prompt optimization methods tend to favor concise, generic instructions over comprehensive, detailed ones. This can undermine performance in complex domains. The second, more severe issue is "context collapse." When an LLM is tasked with repeatedly rewriting its entire accumulated context, it can suffer from a kind of digital amnesia.“What we call ‘context collapse’ happens when an AI tries to rewrite or compress everything it has learned into a single new version of its prompt or memory,” the researchers said in written comments to VentureBeat. “Over time, that rewriting process erases important details—like overwriting a document so many times that key notes disappear. In customer-facing systems, this could mean a support agent suddenly losing awareness of past interactions... causing erratic or inconsistent behavior.”The researchers argue that “contexts should function not as concise summaries, but as comprehensive, evolving playbooks—detailed, inclusive, and rich with domain insights.” This approach leans into the strength of modern LLMs, which can effectively distill relevance from long and detailed contexts.How Agentic Context Engineering (ACE) worksACE is a framework for comprehensive context adaptation designed for both offline tasks, like system prompt optimization, and online scenarios, such as real-time memory updates for agents. Rather than compressing information, ACE treats the context like a dynamic playbook that gathers and organizes strategies over time.The framework divides the labor across three specialized roles: a Generator, a Reflector, and a Curator. This modular design is inspired by “how humans learn—experimenting, reflecting, and consolidating—while avoiding the bottleneck of overloading a single model with all responsibilities,” according to the paper.The workflow starts with the Generator, which produces reasoning paths for input prompts, highlighting both effective strategies and common mistakes. The Reflector then analyzes these paths to extract key lessons. Finally, the Curator synthesizes these lessons into compact updates and merges them into the existing playbook.To prevent context collapse and brevity bias, ACE incorporates two key design principles. First, it uses incremental updates. The context is represented as a collection of structured, itemized bullets instead of a single block of text. This allows ACE to make granular changes and retrieve the most relevant information without rewriting the entire context.Second, ACE uses a “grow-and-refine” mechanism. As new experiences are gathered, new bullets are appended to the playbook and existing ones are updated. A de-duplication step regularly removes redundant entries, ensuring the context remains comprehensive yet relevant and compact over time.ACE in actionThe researchers evaluated ACE on two types of tasks that benefit from evolving context: agent benchmarks requiring multi-turn reasoning and tool use, and domain-specific financial analysis benchmarks demanding specialized knowledge. For high-stakes industries like finance, the benefits extend beyond pure performance. As the researchers said, the framework is “far more transparent: a compliance officer can literally read what the AI learned, since it’s stored in human-readable text rather than hidden in billions of parameters.”The results showed that ACE consistently outperformed strong baselines such as GEPA and classic in-context learning, achieving average performance gains of 10.6% on agent tasks and 8.6% on domain-specific benchmarks in both offline and online settings.Critically, ACE can build effective contexts by analyzing the feedback from its actions and environment instead of requiring manually labeled data. The researchers note that this ability is a "key ingredient for self-improving LLMs and agents." On the public AppWorld benchmark, designed to evaluate agentic systems, an agent using ACE with a smaller open-source model (DeepSeek-V3.1) matched the performance of the top-ranked, GPT-4.1-powered agent on average and surpassed it on the more difficult test set.The takeaway for businesses is significant. “This means companies don’t have to depend on massive proprietary models to stay competitive,” the research team said. “They can deploy local models, protect sensitive data, and still get top-tier results by continuously refining context instead of retraining weights.”Beyond accuracy, ACE proved to be highly efficient. It adapts to new tasks with an average 86.9% lower latency than existing methods and requires fewer steps and tokens. The researchers point out that this efficiency demonstrates that “scalable self-improvement can be achieved with both higher accuracy and lower overhead.”For enterprises concerned about inference costs, the researchers point out that the longer contexts produced by ACE do not translate to proportionally higher costs. Modern serving infrastructures are increasingly optimized for long-context workloads with techniques like KV cache reuse, compression, and offloading, which amortize the cost of handling extensive context.Ultimately, ACE points toward a future where AI systems are dynamic and continuously improving. "Today, only AI engineers can update models, but context engineering opens the door for domain experts—lawyers, analysts, doctors—to directly shape what the AI knows by editing its contextual playbook," the researchers said. This also makes governance more practical. "Selective unlearning becomes much more tractable: if a piece of information is outdated or legally sensitive, it can simply be removed or replaced in the context, without retraining the model.”

We’re launching a new 27 billion parameter foundation model for single-cell analysis built on the Gemma family of open models.

#ai

When Walmart and OpenAI announced that the retailer would integrate with ChatGPT, the question became how quickly OpenAI could deliver on the promise of agents buying things for people. In the battle of AI-enabled commerce, getting agents to securely complete transactions is one of the biggest hurdles. More and more, chat platforms like ChatGPT are replacing browsers and getting very good at surfacing information people search for. Users will ask ChatGPT for the best humidifiers on the market, and when the model returns results, people have no choice but to click the item link and complete the purchase online. AI agents, as of now, don’t have the ability or the trust infrastructure to make people and banking institutions feel safe enough to let it loose on someone’s cash. Enterprises and other industry players understand that, to allow agents to pay for purchases, there must be a common language shared among the model and agent providers, the bank, the merchant, and, to a lesser extent, the buyer.  And so, over the past few weeks, three competing agentic commerce standards have emerged: Google announced the Agent Pay Protocol (AP2) with partners including PayPal, American Express, Mastercard, Salesforce and ServiceNow. Soon after, OpenAI and Stripe debuted the Agentic Commerce Protocol (ACP), and just this week, Visa launched the Trusted Agent Protocol (TAP).All these protocols aim to give agents the trust layer they need to convince banks and their customers that they’re money is safe in the hands of an AI agent. But these may also create walled gardens, showing just how immature agentic commerce really is. This is a problem that could cause enterprises to bet on one chat platform and the agentic pay protocol it runs on, instead of interoperability. How are they differentIt’s not new for players to propose several standards. It usually takes years for the industry to coalesce around a single standard, or even to use different protocols and figure out a way to harmonize them. However, the pace of innovation in enterprise moved the needle on that.  Fairly quickly, MCP became the de facto channel for tool-use identification, and most companies began setting up MCP servers or connecting to one. (To be clear, it is not a standard yet) But having three different potential standards might slow that process down a bit, because it’s harder to coalesce on a single standard when there are so many to choose from. These protocols all aim to prove authorization. Both AP2 and TAP rely on cryptographic proofs to show an agent is acting on an individual's behalf. For TAP, agents are added to an approved list and get a digital key identifying them. AP2 uses a digital contract that serves as a proxy for human approval for the agent. OpenAI’s ACP doesn’t require too much of an infrastructure change, where ACP essentially acts as a courier to the merchant because the agent relays information to the merchant. Walled gardensThese three protocols ideally work across different chat platforms, but that is never guaranteed, especially when your biggest chat platform competitor has its own protocol. A danger with competing protocols is that they can create wall gardens, where they only work on specific platforms. Enterprises face the problem of getting stuck in a platform and an agentic payment standard that will not interoperate with another. Organizations receive not only the product recommended by the agent, but are also most often the merchants of record and need to trust that the agent contacting them is acting on behalf of a customer.Louis Amira, cofounder and CEO of agent commerce startup Circuit and Chisel, told VentureBeat that while this creates an opportunity for companies in the interoperability layer like his, it could create confusion for enterprises. “The better the protocol proposals get, the more likely they are to end up being walled gardens and very hard to interoperate,” Amira said. “We suspect that they’re going to be fighting it out for the next few years, and the more they fight it out, the more you actually need somebody that sits underneath all of them.”Unlike the internet, where anyone can use any browser to access a website, thanks in large part to the TCP/IP standard, chat platforms tend to remain very separate. I mostly use ChatGPT (because it’s installed on my laptop and I don’t need to open a new tab), so when I want to see how Gemini will handle my query, I actually have to open Gemini to do so—the same works for anyone shopping via chatbot.  The number of protocol proposals underscores just how far we are from enabling shopping agents. The industry still needs to decide which standard to get behind, and no matter how many Walmarts integrate with ChatGPT, it’s all moot if people don’t trust the model or agent to handle their cash. Take the best features, hopefully
The best thing for enterprises to do for now is to experiment with all the protocols and hope that a winner emerges. Eventually, there could be one agentic commerce protocol that takes the best of each proposal. For Wayne Liu, chief growth officer and president for Americas at Perfect Corp., having multiple protocol proposals just means there’s more learning.“This is where the importance of open source exists because it will be the driving force to put everything together,” Liu said.  Of course, what would be interesting to see these next couple of weeks is if there will only be three competing agentic commerce protocols. After all, there are some large retailers and chat platforms that can still throw a wrench into the whole thing.

#ai #datadecisionmakers

Agents are the trendiest topic in AI today, and with good reason. AI agents act on their users’ behalf, autonomously handling tasks like making online purchases, building software, researching business trends or booking travel. By taking generative AI out of the sandbox of the chat interface and allowing it to act directly on the world, agentic AI represents a leap forward in the power and utility of AI.Taking gen AI out of the protected sandbox of the chat interface and allowing it to act directly on the world represents a leap forward in the power and utility of AI.Agentic AI has been moving really fast: For example, one of the core building blocks of today’s agents, the model context protocol (MCP), is only a year old! As in any fast-moving field, there are many competing definitions, hot takes and misleading opinions.To cut through the noise, I’d like to describe the core components of an agentic AI system and how they fit together: It’s really not as complicated as it may seem. Hopefully, when you’ve finished reading this post, agents won’t seem as mysterious.Agentic ecosystemDefinitions of the word “agent” abound, but I like a slight variation on the British programmer Simon Willison’s minimalist take:An LLM agent runs tools in a loop to achieve a goal.The user prompts a large language model (LLM) with a goal: Say, booking a table at a restaurant near a specific theater. Along with the goal, the model receives a list of the tools at its disposal, such as a database of restaurant locations or a record of the user’s food preferences. The model then plans how to achieve the goal and calls one of the tools, which provides a response; the model then calls a new tool. Through repetitions, the agent moves toward accomplishing the goal. In some cases, the model’s orchestration and planning choices are complemented or enhanced by imperative code.But what kind of infrastructure does it take to realize this approach? An agentic system needs a few core components:A way to build the agent. When you deploy an agent, you don’t want to have to code it from scratch. There are several agent development frameworks out there.Somewhere to run the AI model. A seasoned AI developer can download an open-weight LLM, but it takes expertise to do that right. It also takes expensive hardware that’s going to be poorly utilized for the average user.Somewhere to run the agentic code. With established frameworks, the user creates code for an agent object with a defined set of functions. Most of those functions involve sending prompts to an AI model, but the code needs to run somewhere. In practice, most agents will run in the cloud, because we want them to keep running when our laptops are closed, and we want them to scale up and out to do their work.A mechanism for translating between the text-based LLM and tool calls.A short-term memory for tracking the content of agentic interactions.A long-term memory for tracking the user’s preferences and affinities across sessions.A way to trace the system’s execution, to evaluate the agent’s performance.Let's dive into more detail on each of these components.Building an agentAsking an LLM to explain how it plans to approach a particular task improves its performance on that task. This “chain-of-thought reasoning” is now ubiquitous in AI.The analogue in agentic systems is the ReAct (reasoning + action) model, in which the agent has a thought (“I’ll use the map function to locate nearby restaurants”), performs an action (issuing an API call to the map function), then makes an observation (“There are two pizza places and one Indian restaurant within two blocks of the movie theater”).ReAct isn’t the only way to build agents, but it is at the core of most successful agentic systems. Today, agents are commonly loops over the thought-action-observation sequence.The tools available to the agent can include local tools and remote tools such as databases, microservices and software as a service. A tool’s specification includes a natural-language explanation of how and when it’s used and the syntax of its API calls.The developer can also tell the agent to, essentially, build its own tools on the fly. Say that a tool retrieves a table stored as comma-separated text, and to fulfill its goal, the agent needs to sort the table.Sorting a table by repeatedly sending it through an LLM and evaluating the results would be a colossal waste of resources — and it’s not even guaranteed to give the right result. Instead, the developer can simply instruct the agent to generate its own Python code when it encounters a simple but repetitive task. These snippets of code can run locally alongside the agent or in a dedicated secure code interpreter tool. Available tools can divide responsibility between the LLM and the developer. Once the tools available to the agent have been specified, the developer can simply instruct the agent what tools to use when necessary. Or, the developer can specify which tool to use for which types of data, and even which data items to use as arguments during function calls.Similarly, the developer can simply tell the agent to generate Python code when necessary to automate repetitive tasks or, alternatively, tell it which algorithms to use for which data types and even provide pseudocode. The approach can vary from agent to agent.RuntimeHistorically, there were two main ways to isolate code running on shared servers: Containerization, which was efficient but offered lower security; and virtual machines, which were secure but came with a lot of computational overhead.In 2018, Amazon Web Services’ (AWS’s) Lambda serverless-computing service deployed Firecracker, a new paradigm in server isolation. Firecracker creates “microVMs”, complete with hardware isolation and their own Linux kernels but with reduced overhead (as low as a few megabytes) and startup times (as low as a few milliseconds). The low overhead means that each function executed on a Lambda server can have its own microVM.However, because instantiating an agent requires deploying an LLM, together with the memory resources to track the LLM’s inputs and outputs, the per-function isolation model is impractical. Instead, with session-based isolation, every session is assigned its own microVM. When the session finishes, the LLM’s state information is copied to long-term memory, and the microVM is destroyed. This ensures secure and efficient deployment of hosts of agents.Tool callsJust as there are several existing development frameworks for agent creation, there are several existing standards for communication between agents and tools, the most popular of which — currently — is the model context protocol (MCP). MCP establishes a one-to-one connection between the agent’s LLM and a dedicated MCP server that executes tool calls, and it also establishes a standard format for passing different types of data back and forth between the LLM and its server.Many platforms use MCP by default, but are also configurable, so they will support a growing set of protocols over time.Sometimes, however, the necessary tool is not one with an available API. In such cases, the only way to retrieve data or perform an action is through cursor movements and clicks on a website. There are a number of services available to perform such computer use. This makes any website a potential tool for agents, opening up decades of content and valuable services that aren’t yet available directly through APIs.AuthorizationsWith agents, authorization works in two directions. First, of course, users require authorization to run the agents they’ve created. But as the agent is acting on the user’s behalf, it will usually require its own authorization to access networked resources.There are a few different ways to approach the problem of authorization. One is with an access delegation algorithm like OAuth, which essentially plumbs the authorization process through the agentic system. The user enters login credentials into OAuth, and the agentic system uses OAuth to log into protected resources, but the agentic system never has direct access to the user’s passwords.In the other approach, the user logs into a secure session on a server, and the server has its own login credentials on protected resources. Permissions allow the user to select from a variety of authorization strategies and algorithms for implementing those strategies.Memory and tracesShort-term memoryLLMs are next-word prediction engines. What makes them so astoundingly versatile is that their predictions are based on long sequences of words they’ve already seen, known as context. Context is, in itself, a kind of memory. But it’s not the only kind an agentic system needs.Suppose, again, that an agent is trying to book a restaurant near a movie theater, and from a map tool, it’s retrieved a couple dozen restaurants within a mile radius. It doesn’t want to dump information about all those restaurants into the LLM’s context: All that extraneous information could wreak havoc with next-word probabilities.Instead, it can store the complete list in short-term memory and retrieve one or two records at a time, based on, say, the user’s price and cuisine preferences and proximity to the theater. If none of those restaurants pans out, the agent can dip back into short-term memory, rather than having to execute another tool call.Long-term memoryAgents also need to remember their prior interactions with their clients. If last week I told the restaurant booking agent what type of food I like, I don’t want to have to tell it again this week. The same goes for my price tolerance, the sort of ambiance I’m looking for, and so on.Long-term memory allows the agent to look up what it needs to know about prior conversations with the user. Agents don’t typically create long-term memories themselves, however. Instead, after a session is complete, the whole conversation passes to a separate AI model, which creates new long-term memories or updates existing ones.Memory creation can involve LLM summarization and “chunking”, in which documents are split into sections grouped according to topic for ease of retrieval during subsequent sessions. Available systems allow the user to select strategies and algorithms for summarization, chunking and other information-extraction techniques.ObservabilityAgents are a new kind of software system, and they require new ways to think about observing, monitoring and auditing their behavior. Some of the questions we ask will look familiar: Whether the agents are running fast enough, how much they’re costing, how many tool calls they’re making and whether users are happy. But new questions will arise, too, and we can’t necessarily predict what data we’ll need to answer them.Observability and tracing tools can provide an end-to-end view of the execution of a session with an agent, breaking down step-by-step which actions were taken and why. For the agent builder, these traces are key to understanding how well agents are working — and provide the data to make them work better.I hope this explanation has demystified agentic AI enough that you’re willing to try building your own agents!

#research #computer science and technology #artificial intelligence #machine learning #algorithms #video #data #computer science and artificial intelligence laboratory (csail) #electrical engineering and computer science (eecs) #school of engineering #mit schwarzman college of computing #mit-ibm watson ai lab

After being trained with this technique, vision-language models can better identify a unique item in a new scene.

Veo 3.1, Haiku 4.5, AI + you = the ad, China shock, AI trust sinks, and more...

#ai #technology

Anthropic released Claude Haiku 4.5 on Wednesday, a smaller and significantly cheaper artificial intelligence model that matches the coding capabilities of systems that were considered cutting-edge just months ago, marking the latest salvo in an intensifying competition to dominate enterprise AI.The model costs $1 per million input tokens and $5 per million output tokens — roughly one-third the price of Anthropic's mid-sized Sonnet 4 model released in May, while operating more than twice as fast. In certain tasks, particularly operating computers autonomously, Haiku 4.5 actually surpasses its more expensive predecessor."Haiku 4.5 is a clear leap in performance and is now largely as smart as Sonnet 4 while being significantly faster and one-third of the cost," an Anthropic spokesperson told VentureBeat, underscoring how rapidly AI capabilities are becoming commoditized as the technology matures.The launch comes just two weeks after Anthropic released Claude Sonnet 4.5, which the company bills as the world's best coding model, and two months after introducing Opus 4.1. The breakneck pace of releases reflects mounting pressure from OpenAI, whose $500 billion valuation dwarfs Anthropic's $183 billion, and which has inked a series of multibillion-dollar infrastructure deals while expanding its product lineup.How free access to advanced AI could reshape the enterprise marketIn an unusual move that could reshape competitive dynamics in the AI market, Anthropic is making Haiku 4.5 available for all free users of its Claude.ai platform. The decision effectively democratizes access to what the company characterizes as "near-frontier-level intelligence" — capabilities that would have been available only in expensive, premium models months ago."The launch of Claude Haiku 4.5 means that near-frontier-level intelligence is available for free to all users through Claude.ai," the Anthropic spokesperson told VentureBeat. "It also offers significant advantages to our enterprise customers: Sonnet 4.5 can handle frontier planning while Haiku 4.5 powers sub-agents, enabling multi-agent systems that tackle complex refactors, migrations, and large features builds with speed and quality."This multi-agent architecture signals a significant shift in how AI systems are deployed. Rather than relying on a single, monolithic model, enterprises can now orchestrate teams of specialized AI agents: a more sophisticated Sonnet 4.5 model breaking down complex problems and delegating subtasks to multiple Haiku 4.5 agents working in parallel. For software development teams, this could mean Sonnet 4.5 plans a major code refactoring while Haiku 4.5 agents simultaneously execute changes across dozens of files.The approach mirrors how human organizations distribute work, and could prove particularly valuable for enterprises seeking to balance performance with cost efficiency — a critical consideration as AI deployment scales.Inside Anthropic's path to $7 billion in annual revenueThe model launch coincides with revelations that Anthropic's business is experiencing explosive growth. The company's annual revenue run rate is approaching $7 billion this month, Anthropic told Reuters, up from more than $5 billion reported in August. Internal projections obtained by Reuters suggest the company is targeting between $20 billion and $26 billion in annualized revenue for 2026, representing growth of more than 200% to nearly 300%.The company now serves more than 300,000 business customers, with enterprise products accounting for approximately 80% of revenue. Among Anthropic's most successful offerings is Claude Code, a code-generation tool that has reached nearly $1 billion in annualized revenue since launching earlier this year.Those numbers come as artificial intelligence enters what many in the industry characterize as a critical inflection point. After two years of what Anthropic Chief Product Officer Mike Krieger recently described as "AI FOMO" — where companies adopted AI tools without clear success metrics — enterprises are now demanding measurable returns on investment."The best products can be grounded in some kind of success metric or evaluation," Krieger said on the "Superhuman AI" podcast. "I've seen that a lot in talking to companies that are deploying AI."For enterprises evaluating AI tools, the calculus increasingly centers on concrete productivity gains. Google CEO Sundar Pichai claimed in June that AI had generated a 10% boost in engineering velocity at his company — though measuring such improvements across different roles and use cases remains challenging, as Krieger acknowledged.Why AI safety testing matters more than ever for enterprise adoptionAnthropic's launch comes amid heightened scrutiny of the company's approach to AI safety and regulation. On Tuesday, David Sacks, the White House's AI "czar" and a venture capitalist, accused Anthropic of "running a sophisticated regulatory capture strategy based on fear-mongering" that is "damaging the startup ecosystem."The attack targeted remarks by Jack Clark, Anthropic's British co-founder and head of policy, who had described being "deeply afraid" of AI's trajectory. Clark told Bloomberg he found Sacks' criticism "perplexing."Anthropic addressed such concerns head-on in its release materials, emphasizing that Haiku 4.5 underwent extensive safety testing. The company classified the model as ASL-2 — its AI Safety Level 2 standard — compared to the more restrictive ASL-3 designation for the more powerful Sonnet 4.5 and Opus 4.1 models."Our teams have red-teamed and tested our agentic capabilities to the limits in order to assess whether it can be used to engage in harmful activity like generating misinformation or promoting fraudulent behavior like scams," the spokesperson told VentureBeat. "In our automated alignment assessment, it showed a statistically significantly lower overall rate of misaligned behaviors than both Claude Sonnet 4.5 and Claude Opus 4.1 — making it, by this metric, our safest model yet."The company said its safety testing showed Haiku 4.5 poses only limited risks regarding the production of chemical, biological, radiological and nuclear weapons. Anthropic has also implemented classifiers designed to detect and filter prompt injection attacks, a common method for attempting to manipulate AI systems into producing harmful content.The emphasis on safety reflects Anthropic's founding mission. The company was established in 2021 by former OpenAI executives, including siblings Dario and Daniela Amodei, who left amid concerns about OpenAI's direction following its partnership with Microsoft. Anthropic has positioned itself as taking a more cautious, research-oriented approach to AI development.Benchmark results show Haiku 4.5 competing with larger, more expensive modelsAccording to Anthropic's benchmarks, Haiku 4.5 performs competitively with or exceeds several larger models across multiple evaluation criteria. On SWE-bench Verified, a widely used test measuring AI systems' ability to solve real-world software engineering problems, Haiku 4.5 scored 73.3% — slightly ahead of Sonnet 4's 72.7% and close to GPT-5 Codex's 74.5%.The model demonstrated particular strength in computer use tasks, achieving 50.7% on the OSWorld benchmark compared to Sonnet 4's 42.2%. This capability allows the AI to interact directly with computer interfaces — clicking buttons, filling forms, navigating applications — which could prove transformative for automating routine digital tasks.In coding-specific benchmarks like Terminal-Bench, which tests AI agents' ability to complete complex software tasks using command-line tools, Haiku 4.5 scored 41.0%, trailing only Sonnet 4.5's 50.0% among Claude models.The model maintains a 200,000-token context window for standard users, with developers accessing the Claude Developer Platform able to use a 1-million-token context window. That expanded capacity means the model can process extremely large codebases or documents in a single request — roughly equivalent to a 1,500-page book.What three major AI model releases in two months says about the competitionWhen asked about the rapid succession of model releases, the Anthropic spokesperson emphasized the company's focus on execution rather than competitive positioning."We're focused on shipping the best possible products for our customers — and our shipping velocity speaks for itself," the spokesperson said. "What was state-of-the-art just five months ago is now faster, cheaper, and more accessible."That velocity stands in contrast to the company's earlier, more measured release schedule. Anthropic appeared to have paused development of its Haiku line after releasing version 3.5 at the end of last year, leading some observers to speculate the company had deprioritized smaller models.That rapid price-performance improvement validates a core promise of artificial intelligence: that capabilities will become dramatically cheaper over time as the technology matures and companies optimize their models. For enterprises, it suggests that today's budget constraints around AI deployment may ease considerably in coming years.From customer service to code: Real-world applications for faster, cheaper AIThe practical applications of Haiku 4.5 span a wide range of enterprise functions, from customer service to financial analysis to software development. The model's combination of speed and intelligence makes it particularly suited for real-time, low-latency tasks like chatbot conversations and customer support interactions, where delays of even a few seconds can degrade user experience.In financial services, the multi-agent architecture enabled by pairing Sonnet 4.5 with Haiku 4.5 could transform how firms monitor markets and manage risk. Anthropic envisions Haiku 4.5 monitoring thousands of data streams simultaneously — tracking regulatory changes, market signals and portfolio risks — while Sonnet 4.5 handles complex predictive modeling and strategic analysis.For research organizations, the division of labor could compress timelines dramatically. Sonnet 4.5 might orchestrate a comprehensive analysis while multiple Haiku 4.5 agents parallelize literature reviews, data gathering and document synthesis across dozens of sources, potentially "compressing weeks of research into hours," according to Anthropic's use case descriptions.Several companies have already integrated Haiku 4.5 and reported positive results. Guy Gur-Ari, co-founder of coding startup Augment, said the model "hit a sweet spot we didn't think was possible: near-frontier coding quality with blazing speed and cost efficiency." In Augment's internal testing, Haiku 4.5 achieved 90% of Sonnet 4.5's performance while matching much larger models.Jeff Wang, CEO of Windsurf, another coding-focused startup, said Haiku 4.5 "is blurring the lines" on traditional trade-offs between speed, cost and quality. "It's a fast frontier model that keeps costs efficient and signals where this class of models is headed."Jon Noronha, co-founder of presentation software company Gamma, reported that Haiku 4.5 "outperformed our current models on instruction-following for slide text generation, achieving 65% accuracy versus 44% from our premium tier model — that's a game-changer for our unit economics."The price of progress: What plummeting AI costs mean for enterprise strategyFor enterprises evaluating AI strategies, Haiku 4.5 presents both opportunity and challenge. The opportunity lies in accessing sophisticated AI capabilities at dramatically lower costs, potentially making viable entire categories of applications that were previously too expensive to deploy at scale.The challenge is keeping pace with a technology landscape that is evolving faster than most organizations can absorb. As Krieger noted in his recent podcast appearance, companies are moving beyond "AI FOMO" to demand concrete metrics and demonstrated value. But establishing those metrics and evaluation frameworks takes time — time that may be in short supply as competitors race ahead.The shift from single-model deployments to multi-agent architectures also requires new ways of thinking about AI systems. Rather than viewing AI as a monolithic assistant, enterprises must learn to orchestrate multiple specialized agents, each optimized for particular tasks — more akin to managing a team than operating a tool.The fundamental economics of AI are shifting with remarkable speed. Five months ago, Sonnet 4's capabilities commanded premium pricing and represented the cutting edge. Today, Haiku 4.5 delivers similar performance at a third of the cost. If that trajectory continues — and both Anthropic's release schedule and competitive pressure from OpenAI and Google suggest it will — the AI capabilities that seem remarkable today may be routine and inexpensive within a year.For Anthropic, the challenge will be translating technical achievements into sustainable business growth while maintaining the safety-focused approach that differentiates it from competitors. The company's projected revenue growth to as much as $26 billion by 2026 suggests strong market traction, but achieving those targets will require continued innovation and successful execution across an increasingly complex product portfolio.Whether enterprises will choose Claude over increasingly capable alternatives from OpenAI, Google and a growing field of competitors remains an open question. But Anthropic is making a clear bet: that the future of AI belongs not to whoever builds the single most powerful model, but to whoever can deliver the right intelligence, at the right speed, at the right price — and make it accessible to everyone.In an industry where the promise of artificial intelligence has long outpaced reality, Anthropic is betting that delivering on that promise, faster and cheaper than anyone expected, will be enough to win. And with pricing dropping by two-thirds in just five months while performance holds steady, that promise is starting to look like reality.

#ai

As expected after days of leaks and rumors online, Google has unveiled Veo 3.1, its latest AI video generation model, bringing a suite of creative and technical upgrades aimed at improving narrative control, audio integration, and realism in AI-generated video. While the updates expand possibilities for hobbyists and content creators using Google’s online AI creation app, Flow, the release also signals a growing opportunity for enterprises, developers, and creative teams seeking scalable, customizable video tools.The quality is higher, the physics better, the pricing the same as before, and the control and editing features more robust and varied.My initial tests showed it to be a powerful and performant model that immediately delights with each generation. However, the look is more cinematic, polished and a little more "artificial" than by default than rivals such as OpenAI's new Sora 2, released late last month, which may or may not be what a particular user is going after (Sora excels at handheld and "candid" style videos). Expanded Control Over Narrative and AudioVeo 3.1 builds on its predecessor, Veo 3 (released back in May 2025) with enhanced support for dialogue, ambient sound, and other audio effects. Native audio generation is now available across several key features in Flow, including “Frames to Video,” “Ingredients to Video,” and “Extend," which give users the ability to, respectively: turn still images into video; use items, characters and objects from multiple images in a single video; and generate longer clips than the initial 8 seconds, to more than 30 seconds or even 1+ plus when continuing from a prior clip's final frame. Before, you had to add audio manually after using these features. This addition gives users greater command over tone, emotion, and storytelling — capabilities that have previously required post-production work.In enterprise contexts, this level of control may reduce the need for separate audio pipelines, offering an integrated way to create training content, marketing videos, or digital experiences with synchronized sound and visuals.Google noted in a blog post that the updates reflect user feedback calling for deeper artistic control and improved audio support. Gallegos emphasizes the importance of making edits and refinements possible directly in Flow, without reworking scenes from scratch.Richer Inputs and Editing CapabilitiesWith Veo 3.1, Google introduces support for multiple input types and more granular control over generated outputs. The model accepts text prompts, images, and video clips as input, and also supports:Reference images (up to three) to guide appearance and style in the final outputFirst and last frame interpolation to generate seamless scenes between fixed endpointsScene extension that continues a video’s action or motion beyond its current durationThese tools aim to give enterprise users a way to fine-tune the look and feel of their content—useful for brand consistency or adherence to creative briefs.Additional capabilities like “Insert” (add objects to scenes) and “Remove” (delete elements or characters) are also being introduced, though not all are immediately available through the Gemini API.Deployment Across PlatformsVeo 3.1 is accessible through several of Google’s existing AI services:Flow, Google’s own interface for AI-assisted filmmakingGemini API, targeted at developers building video capabilities into applicationsVertex AI, where enterprise integration will soon support Veo’s “Scene Extension” and other key featuresAvailability through these platforms allows enterprise customers to choose the right environment—GUI-based or programmatic—based on their teams and workflows.Pricing and AccessThe Veo 3.1 model is currently in preview and available only on the paid tier of the Gemini API. The cost structure is the same as Veo 3, the preceding generation of AI video models from Google.Standard model: $0.40 per second of videoFast model: $0.15 per secondThere is no free tier, and users are charged only if a video is successfully generated. This model is consistent with previous Veo versions and provides predictable pricing for budget-conscious enterprise teams.Technical Specs and Output ControlVeo 3.1 outputs video at 720p or 1080p resolution, with a 24 fps frame rate. Duration options include 4, 6, or 8 seconds from a text prompt or uploaded images, with the ability to extend videos up to 148 seconds (more than 2 and half minutes!) when using the “Extend” feature.New functionality also includes tighter control over subjects and environments. For example, enterprises can upload a product image or visual reference, and Veo 3.1 will generate scenes that preserve its appearance and stylistic cues across the video. This could streamline creative production pipelines for retail, advertising, and virtual content production teams.Initial ReactionsThe broader creator and developer community has responded to Veo 3.1’s launch with a mix of optimism and tempered critique—particularly when comparing it to rival models like OpenAI’s Sora 2.Matt Shumer, an AI founder of Otherside AI/Hyperwrite, and early adopter, described his initial reaction as “disappointment,” noting that Veo 3.1 is “noticeably worse than Sora 2” and also “quite a bit more expensive.”However, he acknowledged that Google’s tooling—such as support for references and scene extension—is a bright spot in the release.Travis Davids, a 3D digital artist and AI content creator, echoed some of that sentiment. While he noted improvements in audio quality, particularly in sound effects and dialogue, he raised concerns about limitations that remain in the system. These include the lack of custom voice support, an inability to select generated voices directly, and the continued cap at 8-second generations—despite some public claims about longer outputs.Davids also pointed out that character consistency across changing camera angles still requires careful prompting, whereas other models like Sora 2 handle this more automatically. He questioned the absence of 1080p resolution for users on paid tiers like Flow Pro and expressed skepticism over feature parity.On the more positive end, @kimmonismus, an AI newsletter writer, stated that “Veo 3.1 is amazing,” though still concluded that OpenAI’s latest model remains preferable overall.Collectively, these early impressions suggest that while Veo 3.1 delivers meaningful tooling enhancements and new creative control features, expectations have shifted as competitors raise the bar on both quality and usability.Adoption and ScaleSince launching Flow five months ago, Google says over 275 million videos have been generated across various Veo models. The pace of adoption suggests significant interest not only from individuals but also from developers and businesses experimenting with automated content creation.Thomas Iljic, Director of Product Management at Google Labs, highlights that Veo 3.1’s release brings capabilities closer to how human filmmakers plan and shoot. These include scene composition, continuity across shots, and coordinated audio—all areas that enterprises increasingly look to automate or streamline.Safety and Responsible AI UseVideos generated with Veo 3.1 are watermarked using Google’s SynthID technology, which embeds an imperceptible identifier to signal that the content is AI-generated. Google applies safety filters and moderation across its APIs to help minimize privacy and copyright risks. Generated content is stored temporarily and deleted after two days unless downloaded.For developers and enterprises, these features provide reassurance around provenance and compliance—critical in regulated or brand-sensitive industries.Where Veo 3.1 Stands Among a Crowded AI Video Model SpaceVeo 3.1 is not just an iteration on prior models—it represents a deeper integration of multimodal inputs, storytelling control, and enterprise-level tooling. While creative professionals may see immediate benefits in editing workflows and fidelity, businesses exploring automation in training, advertising, or virtual experiences may find even greater value in the model’s composability and API support.The early user feedback highlights that while Veo 3.1 offers valuable tooling, expectations around realism, voice control, and generation length are evolving rapidly. As Google expands access through Vertex AI and continues refining Veo, its competitive positioning in enterprise video generation will hinge on how quickly these user pain points are addressed.

#amazon bedrock #amazon nova #best practices

In this post, we share four high-impact, widely adopted use cases built with Nova in Amazon Bedrock, supported by real-world customers deployments, offerings available from AWS partners, and experiences. These examples are ideal for organizations researching their own AI adoption strategies and use cases across industries.

We’re rolling out significant updates to Veo that give people even more creative control.

#google deepmind #ai #google labs

Today, we’re introducing new and enhanced creative capabilities to edit your clips.

#faculty #music and theater arts #music technology #music #arts #artificial intelligence #technology and society #history of mit #education, teaching, academics #school of humanities arts and social sciences #obituaries

The former department chair was an early innovator in the use of artificial intelligence to both study and influence how children learn music.

#profile #students #graduate, postdoctoral #neuroscience #music #artificial intelligence #brain and cognitive sciences #music and theater arts #media lab #school of architecture and planning #school of engineering #school of science #mit schwarzman college of computing #school of humanities arts and social sciences #undergraduate research opportunities program (urop) #mental health

Media Lab PhD student Kimaya Lecamwasam researches how music can shape well-being.

#android #safety & security #ai

As scams grow in number and sophistication, we’re sharing 6 new ways Google is fighting scams to help keep you safe.

#ai & ml #commentary

This article is part of a series on the Sens-AI Framework—practical habits for learning and coding with AI. Read the original framework introduction and explore the complete methodology in Andrew Stellman’s O’Reilly report Critical Thinking Habits for Coding with AI. AI-assisted coding is here to stay. I’ve seen many companies now require all developers to install Copilot extensions […]

#health #ai

We’re launching a new 27 billion parameter foundation model for single-cell analysis built on the Gemma family of open models.

#ai & ml #commentary

“Strange was obliged to invent most of the magic he did, working from general principles and half-remembered stories from old books.” — Susanna Clarke, Jonathan Strange & Mr Norrell Fairy tales, myths, and fantasy fiction are full of magic spells. You say “abracadabra” and something profound happens.1 Say “open sesame” and the door swings open. […]

#ai #automation #technology

The Dfinity Foundation on Wednesday released Caffeine, an artificial intelligence platform that allows users to build and deploy web applications through natural language conversation alone, bypassing traditional coding entirely. The system, which became publicly available today, represents a fundamental departure from existing AI coding assistants by building applications on a specialized decentralized infrastructure designed specifically for autonomous AI development.Unlike GitHub Copilot, Cursor, or other "vibe coding" tools that help human developers write code faster, Caffeine positions itself as a complete replacement for technical teams. Users describe what they want in plain language, and an ensemble of AI models writes, deploys, and continually updates production-grade applications — with no human intervention in the codebase itself."In the future, you as a prospective app owner or service owner… will talk to AI. AI will give you what you want on a URL," said Dominic Williams, founder and chief scientist at the Dfinity Foundation, in an exclusive interview with VentureBeat. "You will use that, completely interact productively, and you'll just keep talking to AI to evolve what that does. The AI, or an ensemble of AIs, will be your tech team."The platform has attracted significant early interest: more than 15,000 alpha users tested Caffeine before its public release, with daily active users representing 26% of those who received access codes — "early Facebook kind of levels," according to Williams. The foundation reports some users spending entire days building applications on the platform, forcing Dfinity to consider usage limits due to underlying AI infrastructure costs.Why Caffeine's custom programming language guarantees your data won't disappearCaffeine's most significant technical claim addresses a problem that has plagued AI-generated code: data loss during application updates. The platform builds applications using Motoko, a programming language developed by Dfinity specifically for AI use, which provides mathematical guarantees that upgrades cannot accidentally delete user data."When AI is updating apps and services in production, a mistake cannot lose data. That's a guarantee," Williams said. "It's not like there are some safeguards to try and stop it losing data. This language framework gives it rails that guarantee if an upgrade, an update to its app's underlying logic, would cause data loss, the upgrade fails and the AI just tries again."This addresses what Williams characterizes as critical failures in competing platforms. User forums for tools like Lovable and Replit, he notes, frequently report three major problems: applications that become irreparably broken as complexity increases, security vulnerabilities that allow unauthorized access, and mysterious data loss during updates.Traditional tech stacks evolved to meet human developer needs — familiarity with SQL databases, preference for known programming languages, existing skill investments. "That's how the traditional tech stacks evolved. It's really evolved to meet human needs," Williams explained. "But in the future, it's going to be different. You're not going to care how the AI did it. Instead, for you, AI is the tech stack."Caffeine's architecture reflects this philosophy. Applications run entirely on the Internet Computer Protocol (ICP), a blockchain-based network that Dfinity launched in May 2021 after raising over $100 million from investors including Andreessen Horowitz and Polychain Capital. The ICP uses what Dfinity calls "chain-key cryptography" to create what Williams describes as "tamper-proof" code — applications that are mathematically guaranteed to execute their written logic without interference from traditional cyberattacks."The code can't be affected by ransomware, so you don't have to worry about malware in the same way you do," Williams said. "Configuration errors don't result in traditional cyber attacks. That passive traditional cyber attacks isn't something you need to worry about."How 'orthogonal persistence' lets AI build apps without managing databasesAt the heart of Caffeine's technical approach is a concept called "orthogonal persistence," which fundamentally reimagines how applications store and manage data. In traditional development, programmers must write extensive code to move data between application logic and separate database systems — marshaling data in and out of SQL servers, managing connections, handling synchronization.Motoko eliminates this entirely. Williams demonstrated with a simple example: defining a blog post data type and declaring a variable to store an array of posts requires just two lines of code. "This declaration is all that's necessary to have the blog maintain its list of posts," he explained during a presentation on the technology. "Compare that to traditional IT where in order to persist the blog posts, you'd have to marshal them in and out of a database server. This is quite literally orders of magnitude more simple."This abstraction allows AI to work at a higher conceptual level, focusing on application logic rather than infrastructure plumbing. "Logic and data are kind of the same," Williams said. "This is one of the things that enables AI to build far more complicated functionality than it could otherwise do."The system also employs what Dfinity calls "loss-safe data migration." When AI needs to modify an application's data structure — adding a "likes" field to blog posts, for example — it must write migration logic in two passes. The framework automatically verifies that the transformation won't result in data loss, refusing to compile or deploy code that could delete information unless explicitly instructed.From million-dollar SaaS contracts to conversational app building in minutesWilliams positions Caffeine as particularly transformative for enterprise IT, where he claims costs could fall to "1% of what they were before" while time-to-market shrinks to similar fractions. The platform targets a spectrum from individual creators to large corporations, all of whom currently face either expensive development teams or constraining low-code templates."A corporation or government department might want to create a corporate portal or CRM, ERP functionality," Williams said, referring to customer relationship management and enterprise resource planning systems. "They will otherwise have to obtain this by signing up for some incredibly expensive SaaS service where they become locked in, their data gets stuck, and they still have to spend a lot of money on consultants customizing the functionality."Applications built through Caffeine are owned entirely by their creators and cannot be shut down by centralized parties — a consequence of running on the decentralized Internet Computer network rather than traditional cloud providers like Amazon Web Services. "When someone says built on the internet computer, it actually means built on the internet computer," Williams emphasized, contrasting this with blockchain projects that merely host tokens while running actual applications on centralized infrastructure.The platform demonstrated this versatility during a July 2025 hackathon in San Francisco, where participants created applications ranging from a "Will Maker" tool for generating legal documents, to "Blue Lens," a voice-AI water quality monitoring system, to "Road Patrol," a gamified community reporting app for infrastructure problems. Critically, many of these came from non-technical participants with no coding background."I'm from a non-technical background, I'm actually a quality assurance professional," said the creator of Blue Lens in a video testimonial. "Through Caffeine I can build something really intuitive and next-gen to the public." The application integrated multiple external services — Eleven Labs for voice AI, real-time government water data through retrieval-augmented generation, and Midjourney-generated visual assets — all coordinated through conversational prompts.What separates Caffeine from GitHub Copilot, Cursor, and the 'vibe coding' waveCaffeine enters a crowded market of AI-assisted development tools, but Williams argues the competition isn't truly comparable. GitHub Copilot, Cursor, and similar tools serve human developers working with traditional technology stacks. Platforms like Replit and Lovable occupy a middle ground, offering "vibe coding" that mixes AI generation with human editing."If you're a Node.js developer, you know you're working with the traditional stack, and you might want to do your coding with Copilot or using Claude or using Cursor," Williams said. "That's a very different thing to what Caffeine is offering. There'll always be cases where you probably wouldn't want to hand over the logic of the control system for a new nuclear missile silo to AI. But there's going to be these holdout areas, right? And there's all the legacy stuff that has to be maintained."The key distinction, according to Williams, lies in production readiness. Existing AI coding tools excel at rapid prototyping but stumble when applications grow complex or require guaranteed reliability. Reddit forums for these platforms document users hitting insurmountable walls where applications break irreparably, or where AI-generated code introduces security vulnerabilities."As the demands and the requirements become more complicated, eventually you can hit a limit, and when you hit that limit, not only can you not go any further, but sometimes your app will get broken and there's no way of going back to where you were before," Williams said. "That can't happen with productive apps, and it also can't be the case that you're getting hacked and losing data, because once you go hands-free, if you like, and there's no tech team, there's no technical people involved, who's going to run the backups and restore your app?"The Internet Computer's architecture addresses this through Byzantine fault tolerance — even if attackers gain physical control over some network hardware, they cannot corrupt applications or their data. "This is the beginning of a compute revolution and it's also the perfect platform for AI to build on," Williams said.Inside the vision: A web that programs itself through natural languageDfinity frames Caffeine within a broader vision it calls the "self-writing internet," where the web literally programs itself through natural language interaction. This represents what Williams describes as a "seismic shift coming to tech" — from human developers selecting technology stacks based on their existing skills, to AI selecting optimal implementations invisible to users."You don't care about whether some human being has learned all of the different platforms and Amazon Web Services or something like that. You don't care about that. You just care: Is it secure? Do you get security guarantees? Is it resilient? What's the level of resilience?" Williams said. "Those are the new parameters."The platform demonstrated this during live demonstrations, including at the World Computer Summit 2025 in Zurich. Williams created a talent recruitment application from scratch in under two minutes, then modified it in real-time while the application ran with users already interacting with it. "You will continue talking to the AI and just keep on refreshing the URL to see the changes," he explained.This capability extends to complex scenarios. During demonstrations, Williams showed building a tennis lesson booking system, an e-commerce platform, and an event registration system — all simultaneously, working on multiple applications in parallel. "We predict that as people get very proficient with Caffeine, they could be working on even 10 apps in parallel," he said.The system writes substantial code: a simple personal blog generated 700 lines of code in a couple of minutes. More complex applications can involve thousands of lines across frontend and backend components, all abstracted away from the user who only describes desired functionality.The economics of cloning: How Caffeine's app market challenges traditional storesCaffeine's economic model differs fundamentally from traditional software-as-a-service platforms. Applications run on the Internet Computer Protocol, which uses a "reverse gas model" where developers pay for computation rather than users paying transaction fees. The platform includes an integrated App Market where creators can publish applications for others to clone and adapt — creating what Dfinity envisions as a new economic ecosystem."App stores today obviously operate on gatekeeping," said Pierre Samaties, chief business officer at Dfinity, during the World Computer Summit. "That's going to erode." Rather than purchasing applications, users can clone them and modify them for their own purposes — fundamentally different from Apple's App Store or Google Play models.Williams acknowledges that Caffeine itself currently runs on centralized infrastructure, despite building applications on the decentralized Internet Computer. "Caffeine itself actually is centralized. It uses aspects of the Internet Computer. We want Caffeine itself to run on the Internet Computer in the future, but it's not there now," he said. The platform leverages commercially available foundation models from companies like Anthropic, whose Claude Sonnet model powers much of Caffeine's backend logic.This pragmatic approach reflects Dfinity's strategy of using best-in-class AI models while focusing its own development on the specialized infrastructure and programming language designed for AI use. "These content models have been developed by companies with enormous budgets, absolutely enormous budgets," Williams said. "I don't think in the near future we'll run AI on the Internet Computer for that reason, unless there's a special case."A decade in the making: From Ethereum roots to the self-writing internetThe Dfinity Foundation has pursued this vision since Williams began researching decentralized networks in late 2013. After involvement with Ethereum before its 2015 launch, Williams became fascinated with the concept of a "world computer"—a public blockchain network that could host not just tokens but entire applications and services."By 2015 I was talking about network-focused drivers, Dfinity back then, and that could really operate as an alternative tech stack, and eventually host even things like social networks and massive enterprise systems," Williams said. The foundation launched the Internet Computer Protocol in May 2021, initially focusing on Web3 developers. Despite not being among the highest-valued blockchain projects, ICP consistently ranks in the top 10 for developer numbers.The pivot to AI-driven development came from recognizing that "in the future, the tech stack will be AI," according to Williams. This realization led to Caffeine's development, announced on Dfinity's public roadmap in March 2025 and demonstrated at the World Computer Summit in June 2025.One successful example of the Dfinity vision running in production is OpenChat, a messaging application that runs entirely on the Internet Computer and is governed by a decentralized autonomous organization (DAO) with tens of thousands of participants voting on source code updates through algorithmic governance. "The community is actually controlling the source code updates," Williams explained. "Developers propose updates, community reads the updates, and if the community is happy, OpenChat updates itself."The skeptics weigh in: Crypto baggage and real-world testing aheadThe platform faces several challenges. Dfinity's crypto industry roots may create perception problems in enterprise markets, Williams acknowledges. "The Web3 industry's reputation is a bit tarnished and probably rightfully so," he said during the World Computer Summit. "Now people can, for themselves, experience what a decentralized network is. We're going to see self-writing take over the enterprise space because the speed and efficiency are just incredible."The foundation's history includes controversy: ICP's token launched in 2021 at over $100 per token with an all-time high around $700, then crashed below $3 in 2023 before recovering. The project has faced legal challenges, including class action lawsuits alleging misleading investors, and Dfinity filed defamation claims against industry critics.Technical limitations also remain. Caffeine cannot yet compile React front-ends on the Internet Computer itself, requiring some off-chain processing. Complex integrations with traditional systems — payment processing through Stripe, for example — still require centralized components. "Your app is running end-to-end on the Internet Computer, then when it needs to actually accept payment, it's going to hand over to your Stripe account," Williams explained.The platform's claims about data loss prevention and security guarantees, while technically grounded in the Motoko language design and Internet Computer architecture, remain to be tested at scale with diverse real-world applications. The 26% daily active user rate from alpha testing is impressive but comes from a self-selected group of early adopters.When five billion smartphone users become developersWilliams rejects concerns that AI-driven development will eliminate software engineering jobs, arguing instead for market expansion. "The self-writing internet empowers eight billion non-technical people," he said. "Some of these people will enter roles in tech, becoming prompt engineers, tech entrepreneurs, or helping run online communities. Humanity will create millions of new custom apps and services, and a subset of those will require professional human assistance."During his World Computer Summit demonstration, Williams was explicit about the scale of transformation Dfinity envisions. "Today there are about 35,000 Web3 engineers in the world. Worldwide there are about 15 million full-stack engineers," he said. "But tomorrow with the self-writing internet, everyone will be a builder. Today there are already about five billion people with internet-connected smartphones and they'll all be able to use Caffeine."The hackathon results suggest this isn't pure hyperbole. A dentist built "Dental Tracks" to help patients manage their dental records. A transportation industry professional created "Road Patrol" for gamified infrastructure reporting. A frustrated knitting student built "Skill Sprout," a garden-themed app for learning new hobbies, complete with material checklists and step-by-step skill breakdowns—all without writing a single line of code."I was learning to knit. I got irritated because I had the wrong materials," the creator explained in a video interview. "I don't know how to do the stitches, so I have to individually search, and it's really intimidating when you're trying to learn something you don't—you don't even know what you don't know."Whether Caffeine succeeds depends on factors still unknown: how production applications perform under real-world stress, whether the Internet Computer scales to millions of applications, whether enterprises can overcome their skepticism of blockchain-adjacent technology. But if Williams is right about the fundamental shift — that AI will be the tech stack, not just a tool for human developers — then someone will build what Caffeine promises.The question isn't whether the future looks like this. It's who gets there first, and whether they can do it without losing everyone's data along the way.

#ai

2025 was supposed to be the year of "AI agents," according to Nvidia CEO Jensen Huang, and other AI industry personnel. And it has been, in many ways, with numerous leading AI model providers such as OpenAI, Google, and even Chinese competitors like Alibaba releasing fine-tuned AI models or applications designed to focus on a narrow set of tasks, such as web search and report writing. But one big hurdle to a future of highly performant, reliable, AI agents remains: getting them to stay on task when the task extends over a number of steps. Third-party benchmark tests show even the most powerful AI models experience higher failure rates the more steps they take to complete a task, and the longer time they spend on it (exceeding hours). A new academic framework called EAGLET proposes a practical and efficient method to improve long-horizon task performance in LLM-based agents — without the need for manual data labeling or retraining. Developed by researchers from Tsinghua University, Peking University, DeepLang AI, and the University of Illinois Urbana-Champaign, EAGLET offers a "global planner" that can be integrated into existing agent workflows to reduce hallucinations and improve task efficiency.EAGLET is a fine-tuned language model that interprets task instructions — typically provided as prompts by the user or the agent's operating environment — and generates a high-level plan for the agent (powered by its own LLM). It does not intervene during execution, but its up-front guidance helps reduce planning errors and improve task completion rates.Addressing the Planning Problem in Long-Horizon AgentsMany LLM-based agents struggle with long-horizon tasks because they rely on reactive, step-by-step reasoning. This approach often leads to trial-and-error behavior, planning hallucinations, and inefficient trajectories. EAGLET tackles this limitation by introducing a global planning module that works alongside the executor agent. Instead of blending planning and action generation in a single model, EAGLET separates them, enabling more coherent, task-level strategies.A Two-Stage Training Pipeline with No Human AnnotationsEAGLET’s planner is trained using a two-stage process that requires no human-written plans or annotations. The first stage involves generating synthetic plans with high-capability LLMs, such as GPT-5 and DeepSeek-V3.1-Think. These plans are then filtered using a novel strategy called homologous consensus filtering, which retains only those that improve task performance for both expert and novice executor agents. In the second stage, a rule-based reinforcement learning process further refines the planner, using a custom-designed reward function to assess how much each plan helps multiple agents succeed.Introducing the Executor Capability Gain Reward (ECGR)One of EAGLET’s key innovations is the Executor Capability Gain Reward (ECGR). This reward measures the value of a generated plan by checking whether it helps both high- and low-capability agents complete tasks more successfully and with fewer steps. It also includes a decay factor to favor shorter, more efficient task trajectories. This approach avoids over-rewarding plans that are only useful to already-competent agents and promotes more generalizable planning guidance.Compatible with Existing Agents and ModelsThe EAGLET planner is designed to be modular and "plug-and-play," meaning it can be inserted into existing agent pipelines without requiring executor retraining. In evaluations, the planner boosted performance across a variety of foundational models, including GPT-4.1, GPT-5, Llama-3.1, and Qwen2.5. It also proved effective regardless of prompting strategy, working well with standard ReAct-style prompts as well as approaches like Reflexion.State-of-the-Art Performance Across BenchmarksEAGLET was tested on three widely used benchmarks for long-horizon agent tasks: ScienceWorld, which simulates scientific experiments in a text-based lab environment; ALFWorld, which tasks agents with completing household activities through natural language in a simulated home setting; and WebShop, which evaluates goal-driven behavior in a realistic online shopping interface.Across all three, executor agents equipped with EAGLET outperformed their non-planning counterparts and other planning baselines, including MPO and KnowAgent. In experiments with the open source Llama-3.1-8B-Instruct model, EAGLET boosted average performance from 39.5 to 59.4, a +19.9 point gain across tasks. On ScienceWorld unseen scenarios, it raised performance from 42.2 to 61.6. In ALFWorld seen scenarios, EAGLET improved outcomes from 22.9 to 54.3, a more than 2.3× increase in performance.Even stronger gains were seen with more capable models. For instance, GPT-4.1 improved from 75.5 to 82.2 average score with EAGLET, and GPT-5 rose from 84.5 to 88.1, despite already being strong performers. In some benchmarks, performance gains were as high as +11.8 points, such as when combining EAGLET with the ETO executor method on ALFWorld unseen tasks.Compared to other planning baselines like MPO, EAGLET consistently delivered higher task completion rates. For example, on ALFWorld unseen tasks with GPT-4.1, MPO achieved 79.1, while EAGLET scored 83.6—a +4.5 point advantage.Additionally, the paper reports that agents using EAGLET complete tasks in fewer steps on average. With GPT-4.1 as executor, average step count dropped from 13.0 (no planner) to 11.1 (EAGLET). With GPT-5, it dropped from 11.4 to 9.4, supporting the claim of improved execution efficiency.Efficiency Gains in Training and ExecutionCompared to RL-based methods like GiGPO, which can require hundreds of training iterations, EAGLET achieved better or comparable results with roughly one-eighth the training effort. This efficiency also carries over into execution: agents using EAGLET typically needed fewer steps to complete tasks. This translates into reduced inference time and compute cost in production scenarios.No Public Code—YetAs of the version submitted to arXiv, the authors have not released an open-source implementation of EAGLET. It is unclear if or when the code will be released, under what license, or how it will be maintained, which may limit the near-term utility of the framework for enterprise deployment. VentureBeat has reached out to the authors to clarify these points and will update this piece when we hear back.Enterprise Deployment Questions RemainWhile the planner is described as plug-and-play, it remains unclear whether EAGLET can be easily integrated into popular enterprise agent frameworks such as LangChain or AutoGen, or if it requires a custom stack to support plan-execute separation. Similarly, the training setup leverages multiple executor agents, which may be difficult to replicate in enterprise environments with limited model access. VentureBeat has asked the researchers whether the homologous consensus filtering method can be adapted for teams that only have access to one executor model or limited compute resources.EAGLET’s authors report success across model types and sizes, but it is not yet known what the minimal viable model scale is for practical deployment. For example, can enterprise teams use the planner effectively with sub-10B parameter open models in latency-sensitive environments? Additionally, the framework may offer industry-specific value in domains like customer support or IT automation, but it remains to be seen how easily the planner can be fine-tuned or customized for such verticals.Real-Time vs. Pre-Generated PlanningAnother open question is how EAGLET is best deployed in practice. Should the planner operate in real-time alongside executors within a loop, or is it better used offline to pre-generate global plans for known task types? Each approach has implications for latency, cost, and operational complexity. VentureBeat has posed this question to the authors and will report any insights that emerge.Strategic Tradeoffs for Enterprise TeamsFor technical leaders at medium-to-large enterprises, EAGLET represents a compelling proof of concept for improving the reliability and efficiency of LLM agents. But without public tooling or implementation guidelines, the framework still presents a build-versus-wait decision. Enterprises must weigh the potential gains in task performance and efficiency against the costs of reproducing or approximating the training process in-house.Potential Use Cases in Enterprise SettingsFor enterprises developing agentic AI systems—especially in environments requiring stepwise planning, such as IT automation, customer support, or online interactions—EAGLET offers a template for how to incorporate planning without retraining. Its ability to guide both open- and closed-source models, along with its efficient training method, may make it an appealing starting point for teams seeking to improve agent performance with minimal overhead.

5,000-year lost civilization, OpenAI adult apps, Microsoft's image AI, and more...

#photos #ai

Learn more about how you can ask Google Photos’ to edit your pictures for you and common ways it can improve photos.

#profile #research #faculty #alumni/ae #food #poverty #algorithms #sustainability #artificial intelligence #data #abdul latif jameel water and food systems lab (j-wafs) #mit sloan school of management #india #policy

An algorithm can change the face of food assistance policy in the Global South, says MIT assistant professor and J-WAFS researcher Ali Aouad.

#ai & ml #commentary

In a fascinating op-ed, David Bell, a professor of history at Princeton, argues that “AI is shedding enlightenment values.” As someone who has taught writing at a similarly prestigious university, and as someone who has written about technology for the past 35 or so years, I had a deep response. Bell’s is not the argument […]

You don’t always need a heavy wrapper, a big client class, or dozens of lines of boilerplate to call a large language model.

#artificial intelligence #imaging #machine learning #manufacturing #materials science and engineering #mechanical engineering #research #school of engineering #light

Acting as a “virtual spectrometer,” SpectroGen generates spectroscopic data in any modality, such as X-ray or infrared, to quickly assess a material’s quality.

#startups #innovation and entrepreneurship (i&e) #alumni/ae #health #algorithms #medicine #disease #artificial intelligence #machine learning #school of engineering #electrical engineering and computer science (eecs)

Co-founded by an MIT alumnus, Watershed Bio offers researchers who aren’t software engineers a way to run large-scale analyses to accelerate biology.

#artificial intelligence #app

From addictive algorithms to exploitative apps, data mining to misinformation, the internet today can be a hazardous place. Books by three influential figures—the intellect behind “net neutrality,” a former Meta executive, and the web’s own inventor—propose radical approaches to fixing it. But are these luminaries the right people for the job? Though each shows conviction,…

« 1...3031323334...191»
×