Latest AI News & Updates

#ai #data infrastructure #security

Visa is introducing a new security framework designed to solve one of the thorniest problems emerging in artificial intelligence-powered commerce: how retailers can tell the difference between legitimate AI shopping assistants and the malicious bots that plague their websites.The payments giant unveiled its Trusted Agent Protocol on Tuesday, establishing what it describes as foundational infrastructure for "agentic commerce" — a term for the rapidly growing practice of consumers delegating shopping tasks to AI agents that can search products, compare prices, and complete purchases autonomously.The protocol enables merchants to cryptographically verify that an AI agent browsing their site is authorized and trustworthy, rather than a bot designed to scrape pricing data, test stolen credit cards, or carry out other fraudulent activities.The launch comes as AI-driven traffic to U.S. retail websites has exploded by more than 4,700% over the past year, according to data from Adobe cited by Visa. That dramatic surge has created an acute challenge for merchants whose existing bot detection systems — designed to block automated traffic — now risk accidentally blocking legitimate AI shoppers along with bad actors."Merchants need additional tools that provide them with greater insight and transparency into agentic commerce activities to ensure they can participate safely," said Rubail Birwadker, Visa's Global Head of Growth, in an exclusive interview with VentureBeat. "Without common standards, potential risks include ecosystem fragmentation and the proliferation of closed loop models."The stakes are substantial. While 85% of shoppers who have used AI to shop report improved experiences, merchants face the prospect of either turning away legitimate AI-powered customers or exposing themselves to sophisticated bot attacks. Visa's own data shows the company prevented $40 billion in fraudulent activity between October 2022 and September 2023, nearly double the previous year, much of it involving AI-powered enumeration attacks where bots systematically test combinations of card numbers until finding valid credentials.Inside the cryptographic handshake: How Visa verifies AI shopping agentsVisa's Trusted Agent Protocol operates through what Birwadker describes as a "cryptographic trust handshake" between merchants and approved AI agents. The system works in three steps:First, AI agents must be approved and onboarded through Visa's Intelligent Commerce program, where they undergo vetting to meet trust and reliability standards. Each approved agent receives a unique digital signature key — essentially a cryptographic credential that proves its identity.When an approved agent visits a merchant's website, it creates a digital signature using its key and transmits three categories of information: Agent Intent (indicating the agent is trusted and intends to retrieve product details or make a purchase), Consumer Recognition (data showing whether the underlying consumer has an existing account with the merchant), and Payment Information (optional payment data to support checkout).Merchants or their infrastructure providers, such as content delivery networks, then validate these digital signatures against Visa's registry of approved agents. "Upon proper validation of these fields, the merchant can confirm the signature is a trusted agent," Birwadker explained.Crucially, Visa designed the protocol to require minimal changes to existing merchant infrastructure. Built on the HTTP Message Signature standard and aligned with Web Both Auth, the protocol works with existing web infrastructure without requiring merchants to overhaul their checkout pages. "This is no-code functionality," Birwadker emphasized, though merchants may need to integrate with Visa's Developer Center to access the verification system.The race for AI commerce standards: Visa faces competition from Google, OpenAI, and StripeVisa developed the protocol in collaboration with Cloudflare, the web infrastructure and security company that already provides bot management services to millions of websites. The partnership reflects Visa's recognition that solving bot verification requires cooperation across the entire web stack, not just the payments layer."Trusted Agent Protocol supplements traditional bot management by providing merchants insights that enable agentic commerce," Birwadker said. "Agents are providing additional context they otherwise would not, including what it intends to do, who the underlying consumer is, and payment information."The protocol arrives as multiple technology giants race to establish competing standards for AI commerce. Google recently introduced its Agent Protocol for Payments (AP2), while OpenAI and Stripe have discussed their own approaches to enabling AI agents to make purchases. Microsoft, Shopify, Adyen, Ant International, Checkout.com, Cybersource, Elavon, Fiserv, Nuvei, and Worldpay provided feedback during Trusted Agent Protocol's development, according to Visa.When asked how Visa's protocol relates to these competing efforts, Birwadker struck a collaborative tone. "Both Google's AP2 and Visa's Trusted Agent Protocol are working toward the same goal of building trust in agent-initiated payments," he said. "We are engaged with Google, OpenAI, and Stripe and are looking to create compatibility across the ecosystem."Visa says it is working with global standards bodies including the Internet Engineering Task Force (IETF), OpenID Foundation, and EMVCo to ensure the protocol can eventually become interoperable with other emerging standards. "While these specifications apply to the Visa network in this initial phase, enabling agents to safely and securely act on a consumer's behalf requires an open, ecosystem-wide approach," Birwadker noted.Who pays when AI agents go rogue? Unanswered questions about liability and authorizationThe protocol raises important questions about authorization and liability when AI agents make purchases on behalf of consumers. If an agent completes an unauthorized transaction — perhaps misunderstanding a user's intent or exceeding its delegated authority — who bears responsibility?Birwadker emphasized that the protocol helps merchants "leverage this information to enable experiences tied to existing consumer relationships and more secure checkout," but he did not provide specific details about how disputes would be handled when agents make unauthorized purchases. Visa's existing fraud protection and chargeback systems would presumably apply, though the company has not yet published detailed guidance on agent-initiated transaction disputes.The protocol also places Visa in the position of gatekeeper for the emerging agentic commerce ecosystem. Because Visa determines which AI agents get approved for the Intelligent Commerce program and receive cryptographic credentials, the company effectively controls which agents merchants can easily trust. "Agents are approved and onboarded through the Visa Intelligent Commerce program, ensuring they meet our standards for trust and reliability," Birwadker said, though he did not detail the specific criteria agents must meet or whether Visa charges fees for approval.This gatekeeping role could prove contentious, particularly if Visa's approval process favors large technology companies over startups, or if the company faces pressure to block agents from competitors or politically controversial entities. Visa declined to provide details about how many agents it has approved so far or how long the vetting process typically takes.Visa's legal battles and the long road to merchant adoptionThe protocol launch comes at a complex moment for Visa, which continues to navigate significant legal and regulatory challenges even as its core business remains robust. The company's latest earnings report for the third quarter of fiscal year 2025 showed a 10% increase in net revenues to $9.2 billion, driven by resilient consumer spending and strong growth in cross-border transaction volume. For the full fiscal year ending September 30, 2024, Visa processed 289 billion transactions, with a total payments volume of $15.2 trillion.
However, the company's legal headwinds have intensified. In July 2025, a federal judge rejected a landmark $30 billion settlement that Visa and Mastercard had reached with merchants over long-disputed credit card swipe fees, sending the parties back to the negotiating table and extending the long-running legal battle.Simultaneously, Visa remains under investigation by the Department of Justice over its rules for routing debit card transactions, with regulators scrutinizing whether the company's practices unlawfully limit merchant choice and stifle competition. These domestic challenges are mirrored abroad, where European regulators have continued their own antitrust investigations into the fee structures of both Visa and its primary competitor, Mastercard.Against this backdrop of regulatory pressure, Birwadker acknowledged that adoption of the Trusted Agent Protocol will take time. "As agentic commerce continues to rise, we recognize that consumer trust is still in its early stages," he said. "That's why our focus through 2025 is on building foundational credibility and demonstrating real-world value."The protocol is available immediately in Visa's Developer Center and on GitHub, with agent onboarding already active and merchant integration resources available. But Birwadker declined to provide specific targets for how many merchants might adopt the protocol by the end of 2026. "Adoption is aligned with the momentum we're already seeing," he said. "The launch of our protocol marks another big step — it's not just a technical milestone, but a signal that the industry is beginning to unify."Industry analysts say merchant adoption will likely depend on how quickly agentic commerce grows as a percentage of overall e-commerce. While AI-driven traffic has surged dramatically, much of that consists of agents browsing and researching rather than completing purchases. If AI agents begin accounting for a significant share of completed transactions, merchants will face stronger incentives to adopt verification systems like Visa's protocol.From fraud detection to AI gatekeeping: Visa's $10 billion bet on artificial intelligenceVisa's move reflects broader strategic bets on AI across the financial services industry. The company has invested $10 billion in technology over the past five years to reduce fraud and increase network security, with AI and machine learning central to those efforts. Visa's fraud detection system analyzes over 500 different attributes for each transaction, using AI models to assign real-time risk scores to the 300 billion annual transactions flowing through its network."Every single one of those transactions has been processed by AI," James Mirfin, Visa's global head of risk and identity solutions, said in a July 2024 CNBC interview discussing the company's fraud prevention efforts. "If you see a new type of fraud happening, our model will see that, it will catch it, it will score those transactions as high risk and then our customers can decide not to approve those transactions."The company has also moved aggressively into new payment territories beyond its core card business. In January 2025, Visa partnered with Elon Musk's X (formerly Twitter) to provide the infrastructure for a digital wallet and peer-to-peer payment service called the X Money Account, competing with services like Venmo and Zelle. That deal marked Visa's first major partnership in the social media payments space and reflected the company's recognition that payment flows are increasingly happening outside traditional e-commerce channels.The agentic commerce protocol represents an extension of this strategy — an attempt to ensure Visa remains central to payment flows even as the mechanics of shopping shift from direct human interaction to AI intermediation. Jack Forestell, Visa's Chief Product & Strategy Officer, framed the protocol in expansive terms: "We believe the entire payments ecosystem has a responsibility to ensure sellers trust AI agents with the same confidence they place in their most valued customers and networks."The coming battle for control of AI shoppingThe real test for Visa's protocol won't be technical — it will be political. As AI agents become a larger force in retail, whoever controls the verification infrastructure controls access to hundreds of billions of dollars in commerce. Visa's position as gatekeeper gives it enormous leverage, but also makes it a target.Merchants chafing under Visa's existing fee structure and facing multiple antitrust investigations may resist ceding even more power to the payments giant. Competitors like Google and OpenAI, each with their own ambitions in commerce, have little incentive to let Visa dictate standards. Regulators already scrutinizing Visa's market dominance will surely examine whether its agent approval process unfairly advantages certain players.And there's a deeper question lurking beneath the technical specifications and corporate partnerships: In an economy increasingly mediated by AI, who decides which algorithms get to spend our money? Visa is making an aggressive bid to be that arbiter, wrapping its answer in the language of security and interoperability. Whether merchants, consumers, and regulators accept that proposition will determine not just the fate of the Trusted Agent Protocol, but the structure of AI-powered commerce itself.For now, Visa is moving forward with the confidence of a company that has weathered disruption before. But in the emerging world of agentic commerce, being too trusted might prove just as dangerous as not being trusted enough.

UMass Amherst engineers have built an artificial neuron powered by bacterial protein nanowires that functions like a real one, but at extremely low voltage. This allows for seamless communication with biological cells and drastically improved energy efficiency. The discovery could lead to bio-inspired computers and wearable electronics that no longer need power-hungry amplifiers. Future applications may include sensors powered by sweat or devices that harvest electricity from thin air.

#ai

Researchers at the Massachusetts Institute of Technology (MIT) are gaining renewed attention for developing and open sourcing a technique that allows large language models (LLMs) — like those underpinning ChatGPT and most modern AI chatbots — to improve themselves by generating synthetic data to fine-tune upon. The technique, known as SEAL (Self-Adapting LLMs), was first described in a paper published back in June and covered by VentureBeat at the time.A significantly expanded and updated version of the paper was released last month, as well as open source code posted on Github (under an MIT License, allowing for commercial and enterprise usage), and is making new waves among AI power users on the social network X this week.SEAL allows LLMs to autonomously generate and apply their own fine-tuning strategies. Unlike conventional models that rely on fixed external data and human-crafted optimization pipelines, SEAL enables models to evolve by producing their own synthetic training data and corresponding optimization directives.The development comes from a team affiliated with MIT’s Improbable AI Lab, including Adam Zweiger, Jyothish Pari, Han Guo, Ekin Akyürek, Yoon Kim, and Pulkit Agrawal. Their research was recently presented at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025).Background: From “Beyond Static AI” to Self-Adaptive SystemsEarlier this year, VentureBeat first reported on SEAL as an early-stage framework that allowed language models to generate and train on their own synthetic data — a potential remedy for the stagnation of pretrained models once deployed. At that stage, SEAL was framed as a proof-of-concept that could let enterprise AI agents continuously learn in dynamic environments without manual retraining.Since then, the research has advanced considerably. The new version expands on the prior framework by demonstrating that SEAL’s self-adaptation ability scales with model size, integrates reinforcement learning more effectively to reduce catastrophic forgetting, and formalizes SEAL’s dual-loop structure (inner supervised fine-tuning and outer reinforcement optimization) for reproducibility. The updated paper also introduces evaluations across different prompting formats, improved stability during learning cycles, and a discussion of practical deployment challenges at inference time.Addressing the Limitations of Static ModelsWhile LLMs have demonstrated remarkable capabilities in text generation and understanding, their adaptation to new tasks or knowledge is often manual, brittle, or dependent on context. SEAL challenges this status quo by equipping models with the ability to generate what the authors call “self-edits” — natural language outputs that specify how the model should update its weights.These self-edits may take the form of reformulated information, logical implications, or tool configurations for augmentation and training. Once generated, the model fine-tunes itself based on these edits. The process is guided by reinforcement learning, where the reward signal comes from improved performance on a downstream task.The design mimics how human learners might rephrase or reorganize study materials to better internalize information. This restructuring of knowledge before assimilation serves as a key advantage over models that passively consume new data “as-is.”Performance Across TasksSEAL has been tested across two main domains: knowledge incorporation and few-shot learning.In the knowledge incorporation setting, the researchers evaluated how well a model could internalize new factual content from passages similar to those in the SQuAD dataset, a benchmark reading comprehension dataset introduced by Stanford University in 2016, consisting of over 100,000 crowd-sourced question–answer pairs based on Wikipedia articles (Rajpurkar et al., 2016). Rather than fine-tuning directly on passage text, the model generated synthetic implications of the passage and then fine-tuned on them. After two rounds of reinforcement learning, the model improved question-answering accuracy from 33.5% to 47.0% on a no-context version of SQuAD — surpassing results obtained using synthetic data generated by GPT-4.1.In the few-shot learning setting, SEAL was evaluated using a subset of the ARC benchmark, where tasks require reasoning from only a few examples. Here, SEAL generated self-edits specifying data augmentations and hyperparameters. After reinforcement learning, the success rate in correctly solving held-out tasks jumped to 72.5%, up from 20% using self-edits generated without reinforcement learning. Models that relied solely on in-context learning without any adaptation scored 0%.Technical FrameworkSEAL operates using a two-loop structure: an inner loop performs supervised fine-tuning based on the self-edit, while an outer loop uses reinforcement learning to refine the policy that generates those self-edits.The reinforcement learning algorithm used is based on ReSTEM, which combines sampling with filtered behavior cloning. During training, only self-edits that lead to performance improvements are reinforced. This approach effectively teaches the model which kinds of edits are most beneficial for learning.For efficiency, SEAL applies LoRA-based fine-tuning rather than full parameter updates, enabling rapid experimentation and low-cost adaptation.Strengths and LimitationsThe researchers report that SEAL can produce high-utility training data with minimal supervision, outperforming even large external models like GPT-4.1 in specific tasks. They also demonstrate that SEAL generalizes beyond its original setup: it continues to perform well when scaling from single-pass updates to multi-document continued pretraining scenarios.However, the framework is not without limitations. One issue is catastrophic forgetting, where updates to incorporate new information can degrade performance on previously learned tasks. In response to this concern, co-author Jyo Pari told VentureBeat via email that reinforcement learning (RL) appears to mitigate forgetting more effectively than standard supervised fine-tuning (SFT), citing a recent paper on the topic. He added that combining this insight with SEAL could lead to new variants where SEAL learns not just training data, but reward functions.Another challenge is computational overhead: evaluating each self-edit requires fine-tuning and performance testing, which can take 30–45 seconds per edit — significantly more than standard reinforcement learning tasks. As Jyo explained, “Training SEAL is non-trivial because it requires 2 loops of optimization, an outer RL one and an inner SFT one. At inference time, updating model weights will also require new systems infrastructure.” He emphasized the need for future research into deployment systems as a critical path to making SEAL practical.Additionally, SEAL’s current design assumes the presence of paired tasks and reference answers for every context, limiting its direct applicability to unlabeled corpora. However, Jyo clarified that as long as there is a downstream task with a computable reward, SEAL can be trained to adapt accordingly—even in safety-critical domains. In principle, a SEAL-trained model could learn to avoid training on harmful or malicious inputs if guided by the appropriate reward signal.AI Community ReactionsThe AI research and builder community has reacted with a mix of excitement and speculation to the SEAL paper. On X, formerly Twitter, several prominent AI-focused accounts weighed in on the potential impact.User @VraserX, a self-described educator and AI enthusiast, called SEAL “the birth of continuous self-learning AI” and predicted that models like OpenAI's GPT-6 could adopt similar architecture. In their words, SEAL represents “the end of the frozen-weights era,” ushering in systems that evolve as the world around them changes. They highlighted SEAL's ability to form persistent memories, repair knowledge, and learn from real-time data, comparing it to a foundational step toward models that don’t just use information but absorb it.Meanwhile, @alex_prompter, co-founder of an AI-powered marketing venture, framed SEAL as a leap toward models that literally rewrite themselves. “MIT just built an AI that can rewrite its own code to get smarter,” he wrote. Citing the paper’s key results — a 40% boost in factual recall and outperforming GPT-4.1 using self-generated data — he described the findings as confirmation that “LLMs that finetune themselves are no longer sci-fi.”The enthusiasm reflects a broader appetite in the AI space for models that can evolve without constant retraining or human oversight — particularly in rapidly changing domains or personalized use cases.Future Directions and Open QuestionsIn response to questions about scaling SEAL to larger models and tasks, Jyo pointed to experiments (Appendix B.7) showing that as model size increases, so does their self-adaptation ability. He compared this to students improving their study techniques over time — larger models are simply better at generating useful self-edits.When asked whether SEAL generalizes to new prompting styles, he confirmed it does, citing Table 10 in the paper. However, he also acknowledged that the team has not yet tested SEAL’s ability to transfer across entirely new domains or model architectures. “SEAL is an initial work showcasing the possibilities,” he said. “But it requires much more testing.” He added that generalization may improve as SEAL is trained on a broader distribution of tasks.Interestingly, the team found that only a few reinforcement learning steps already led to measurable performance gains. “This is exciting,” Jyo noted, “because it means that with more compute, we could hopefully get even more improvements.” He suggested future experiments could explore more advanced reinforcement learning methods beyond ReSTEM, such as Group Relative Policy Optimization (GRPO).Toward More Adaptive and Agentic ModelsSEAL represents a step toward models that can autonomously improve over time, both by integrating new knowledge and by reconfiguring how they learn. The authors envision future extensions where SEAL could assist in self-pretraining, continual learning, and the development of agentic systems — models that interact with evolving environments and adapt incrementally.In such settings, a model could use SEAL to synthesize weight updates after each interaction, gradually internalizing behaviors or insights. This could reduce the need for repeated supervision and manual intervention, particularly in data-constrained or specialized domains.As public web text becomes saturated and further scaling of LLMs becomes bottlenecked by data availability, self-directed approaches like SEAL could play a critical role in pushing the boundaries of what LLMs can achieve.You can access the SEAL project, including code and further documentation, at: https://jyopari.github.io/posts/seal

#ai

Enterprises often find that when they fine-tune models, one effective approach to making a large language model (LLM) fit for purpose and grounded in data is to have the model lose some of its abilities. After fine-tuning, some models “forget” how to perform certain tasks or other tasks they already learned. Research from the University of Illinois Urbana-Champaign proposes a new method for retraining models that avoids “catastrophic forgetting,” in which the model loses some of its prior knowledge. The paper focuses on two specific LLMs that generate responses from images: LLaVA and Qwen 2.5-VL.The approach encourages enterprises to retrain only narrow parts of an LLM to avoid retraining the entire model and incurring a significant increase in compute costs. The team claims that catastrophic forgetting isn’t true memory loss, but rather a side effect of bias drift. “Training a new LMM can cost millions of dollars, weeks of time, and emit hundreds of tons of CO2, so finding ways to more efficiently and effectively update existing models is a pressing concern,” the team wrote in the paper. “Guided by this result, we explore tuning recipes that preserve learning while limiting output shift.”The researchers focused on a multi-layer perceptron (MLP), the model's internal decision-making component. 
Catastrophic forgetting The researchers wanted first to verify the existence and the cause of catastrophic forgetting in models. To do this, they created a set of target tasks for the models to complete. The models were then fine-tuned and evaluated to determine whether they led to substantial forgetting. But as the process went on, the researchers found that the models were recovering some of their abilities. “We also noticed a surprising result, that the model performance would drop significantly in held out benchmarks after training on the counting task, it would mostly recover on PathVQA, another specialized task that is not well represented in the benchmarks,” they said. “Meanwhile, while performing the forgetting mitigation experiments, we also tried separately tuning only the self-attention projection (SA Proj) or MLP layers, motivated by the finding that tuning only the LLM was generally better than tuning the full model. This led to another very surprising result – that tuning only self-attention projection layers led to very good learning of the target tasks with no drop in performance in held out tasks, even after training all five target tasks in a sequence.”The researchers said they believe that “what looks like forgetting or interference after fine-tuning on a narrow target task is actually bias in the output distribution due to the task distribution shift.”Narrow retrainingThat finding turned out to be the key to the experiment. The researchers noted that tuning the MLP increases the likelihood of “outputting numeric tokens and a highly correlated drop in held out task accuracy.” What it showed is that a model forgetting some of its knowledge is only temporary and not a long-term matter. “To avoid biasing the output distribution, we tune the MLP up/gating projections while keeping the down projection frozen, and find that it achieves similar learning to full MLP tuning with little forgetting,” the researchers said. This allows for a more straightforward and more reproducible method for fine-tuning a model. By focusing on a narrow segment of the model, rather than a wholesale retraining, enterprises can cut compute costs. It also allows better control of output drift. However, the research focuses only on two models, specifically those dealing with vision and language. The researchers noted that due to limited resources, they are unable to try the experiment with other models.Their findings, however, can be extended to other LLMs, especially for different modalities. 

#search #ai

We’re introducing two new AI-powered features in Search and Discover to help you connect with fresh content and links from across the web.

#search #ai

Here’s your guide for editing images in Search using Lens with Nano Banana.

#search #google labs #ai #photos #google lens

Google is bringing AI image editing with Nano Banana to Search, NotebookLM and Photos.

#gemini models #google labs #ai

NotebookLM's Video Overviews get an upgrade with visuals powered by Nano Banana and a new "Brief" format for quick summaries.

#ai #ecommerce

A new research paper quietly published last week outlines a breakthrough method that allows large language models (LLMs) to simulate human consumer behavior with startling accuracy, a development that could reshape the multi-billion-dollar market research industry. The technique promises to create armies of synthetic consumers who can provide not just realistic product ratings, but also the qualitative reasoning behind them, at a scale and speed currently unattainable.For years, companies have sought to use AI for market research, but have been stymied by a fundamental flaw: when asked to provide a numerical rating on a scale of 1 to 5, LLMs produce unrealistic and poorly distributed responses. A new paper, "LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings," submitted to the pre-print server arXiv on October 9th proposes an elegant solution that sidesteps this problem entirely.The international team of researchers, led by Benjamin F. Maier, developed a method they call semantic similarity rating (SSR). Instead of asking an LLM for a number, SSR prompts the model for a rich, textual opinion on a product. This text is then converted into a numerical vector — an "embedding" — and its similarity is measured against a set of pre-defined reference statements. For example, a response of "I would absolutely buy this, it's exactly what I'm looking for" would be semantically closer to the reference statement for a "5" rating than to the statement for a "1."The results are striking. Tested against a massive real-world dataset from a leading personal care corporation — comprising 57 product surveys and 9,300 human responses — the SSR method achieved 90% of human test-retest reliability. Crucially, the distribution of AI-generated ratings was statistically almost indistinguishable from the human panel. The authors state, "This framework enables scalable consumer research simulations while preserving traditional survey metrics and interpretability."A timely solution as AI threatens survey integrityThis development arrives at a critical time, as the integrity of traditional online survey panels is increasingly under threat from AI. A 2024 analysis from the Stanford Graduate School of Business highlighted a growing problem of human survey-takers using chatbots to generate their answers. These AI-generated responses were found to be "suspiciously nice," overly verbose, and lacking the "snark" and authenticity of genuine human feedback, leading to what researchers called a "homogenization" of data that could mask serious issues like discrimination or product flaws.Maier's research offers a starkly different approach: instead of fighting to purge contaminated data, it creates a controlled environment for generating high-fidelity synthetic data from the ground up."What we're seeing is a pivot from defense to offense," said one analyst not affiliated with the study. "The Stanford paper showed the chaos of uncontrolled AI polluting human datasets. This new paper shows the order and utility of controlled AI creating its own datasets. For a Chief Data Officer, this is the difference between cleaning a contaminated well and tapping into a fresh spring."From text to intent: The technical leap behind the synthetic consumerThe technical validity of the new method hinges on the quality of the text embeddings, a concept explored in a 2022 paper in EPJ Data Science. That research argued for a rigorous "construct validity" framework to ensure that text embeddings — the numerical representations of text — truly "measure what they are supposed to." The success of the SSR method suggests its embeddings effectively capture the nuances of purchase intent. For this new technique to be widely adopted, enterprises will need to be confident that the underlying models are not just generating plausible text, but are mapping that text to scores in a way that is robust and meaningful.The approach also represents a significant leap from prior research, which has largely focused on using text embeddings to analyze and predict ratings from existing online reviews. A 2022 study, for example, evaluated the performance of models like BERT and word2vec in predicting review scores on retail sites, finding that newer models like BERT performed better for general use. The new research moves beyond analyzing existing data to generating novel, predictive insights before a product even hits the market.The dawn of the digital focus groupFor technical decision-makers, the implications are profound. The ability to spin up a "digital twin" of a target consumer segment and test product concepts, ad copy, or packaging variations in a matter of hours could drastically accelerate innovation cycles. As the paper notes, these synthetic respondents also provide "rich qualitative feedback explaining their ratings," offering a treasure trove of data for product development that is both scalable and interpretable. While the era of human-only focus groups is far from over, this research provides the most compelling evidence yet that their synthetic counterparts are ready for business.But the business case extends beyond speed and scale. Consider the economics: a traditional survey panel for a national product launch might cost tens of thousands of dollars and take weeks to field. An SSR-based simulation could deliver comparable insights in a fraction of the time, at a fraction of the cost, and with the ability to iterate instantly based on findings. For companies in fast-moving consumer goods categories — where the window between concept and shelf can determine market leadership — this velocity advantage could be decisive.There are, of course, caveats. The method was validated on personal care products; its performance on complex B2B purchasing decisions, luxury goods, or culturally specific products remains unproven. And while the paper demonstrates that SSR can replicate aggregate human behavior, it does not claim to predict individual consumer choices. The technique works at the population level, not the person level — a distinction that matters greatly for applications like personalized marketing.Yet even with these limitations, the research is a watershed. While the era of human-only focus groups is far from over, this paper provides the most compelling evidence yet that their synthetic counterparts are ready for business. The question is no longer whether AI can simulate consumer sentiment, but whether enterprises can move fast enough to capitalize on it before their competitors do.

#ai #automation #data infrastructure #enterprise

As 50,000 attendees descend on Salesforce's Dreamforce conference this week, the enterprise software giant is making its most aggressive bet yet on artificial intelligence agents, positioning itself as the antidote to what it calls an industry-wide "pilot purgatory" where 95% of enterprise AI projects never reach production.The company on Monday launched Agentforce 360, a sweeping reimagination of its entire product portfolio designed to transform businesses into what it calls "agentic enterprises" — organizations where AI agents work alongside humans to handle up to 40% of work across sales, service, marketing, and operations."We are truly in the agentic AI era, and I think it's probably the biggest revolution, the biggest transition in technology I've ever experienced in my career," said Parker Harris, Salesforce's co-founder and chief technology officer, during a recent press briefing. "In the future, 40% of the work in the Fortune 1000 is probably going to be done by AI, and it's going to be humans and AI actually working together."The announcement comes at a pivotal moment for Salesforce, which has deployed more than 12,000 AI agent implementations over the past year while building what Harris called a "$7 billion business" around its AI platform. Yet the launch also arrives amid unusual turbulence, as CEO Marc Benioff faces fierce backlash for recent comments supporting President Trump and suggesting National Guard troops should patrol San Francisco streets.Why 95% of enterprise AI projects never launchThe stakes are enormous. While companies have rushed to experiment with AI following ChatGPT's emergence two years ago, most enterprise deployments have stalled before reaching production, according to recent MIT research that Salesforce executives cited extensively."Customers have invested a lot in AI, but they're not getting the value," said Srini Tallapragada, Salesforce's president and chief engineering and customer success officer. "95% of enterprise AI pilots fail before production. It's not because of lack of intent. People want to do this. Everybody understands the power of the technology. But why is it so hard?"The answer, according to Tallapragada, is that AI tools remain disconnected from enterprise workflows, data, and governance systems. "You're writing prompts, prompts, you're getting frustrated because the context is not there," he said, describing what he called a "prompt doom loop."Salesforce's solution is a deeply integrated platform connecting what it calls four ingredients: the Agentforce 360 agent platform, Data 360 for unified data access, Customer 360 apps containing business logic, and Slack as the "conversational interface" where humans and agents collaborate.Slack becomes the front door to SalesforcePerhaps the most significant strategic shift is the elevation of Slack — acquired by Salesforce in 2019 for $27.7 billion — as the primary interface for Salesforce itself. The company is effectively reimagining its traditional Lightning interface around Slack channels, where sales deals, service cases, and data insights will surface conversationally rather than through forms and dashboards."Imagine that you maybe don't log into Salesforce, you don't see Salesforce, but it's there. It's coming to you in Slack, because that's where you're getting your work done," Harris explained.The strategy includes embedding Salesforce's Agentforce agents for sales, IT service, HR service, and analytics directly into Slack, alongside a completely rebuilt Slackbot that acts as a personal AI companion. The company is also launching "Channel Expert," an always-on agent that provides instant answers from channel conversations.To enable third-party AI tools to access Slack's conversational data, Salesforce is releasing a Real-Time Search API and Model Context Protocol server. Partners including OpenAI, Anthropic, Google, Perplexity, Writer, Dropbox, Notion, and Cursor are building agents that will live natively in Slack."The best way to see the power of the platform is through the AI apps and agents already being built," Rob Seaman, a Salesforce executive, said during a technical briefing, citing examples of startups "achieving tens of thousands of customers that have it installed in 120 days or less."Voice and IT service take aim at new marketsBeyond Slack integration, Salesforce announced major expansions into voice-based interactions and employee service. Agentforce Voice, now generally available, transforms traditional IVR systems into natural conversations that can update CRM records, trigger workflows, and seamlessly hand off to human agents.The IT Service offering represents Salesforce's most direct challenge to ServiceNow, the market leader. Mudhu Sudhakar, who joined Salesforce two months ago as senior vice president for IT and HR Service, positioned the product as a fundamental reimagining of employee support."Legacy IT service management is very portals, forms, tickets focused, manual process," Sudhakar said. "What we had a few key tenets: conversation first and agent first, really focused on having a conversational experience for the people requesting the support and for the people providing the support."The IT Service platform includes what Salesforce describes as 25+ specialized agents and 100+ pre-built workflows and connectors that can handle everything from password resets to complex incident management.Early customers report dramatic efficiency gainsCustomer results suggest the approach is gaining traction. Reddit reduced average support resolution time from 8.9 minutes to 1.4 minutes — an 84% improvement — while deflecting 46% of cases entirely to AI agents. "This efficiency has allowed us to provide on-demand help for complex tasks and boost advertiser satisfaction scores by 20%," said John Thompson, Reddit's VP of sales strategy and operations, in a statement.Engine, a travel management company, reduced average handle time by 15%, saving over $2 million annually. OpenTable resolved 70% of restaurant and diner inquiries autonomously. And 1-800Accountant achieved a 90% case deflection rate during the critical tax week period.Salesforce's own internal deployments may be most telling. Tallapragada's customer success organization now handles 1.8 million AI-powered conversations weekly, with metrics published at help.salesforce.com showing how many agents answer versus escalating to humans.Even more significantly, Salesforce has deployed AI-powered sales development representatives to follow up on leads that would previously have gone uncontacted due to cost constraints. "Now, Agentforce has an SDR which is doing thousands of leads following up," Tallapragada explained. The company also increased proactive customer outreach by 40% by shifting staff from reactive support.The trust layer problem enterprises can't ignoreGiven enterprise concerns about AI reliability, Salesforce has invested heavily in what it calls the "trust layer" — audit trails, compliance checks, and observability tools that let organizations monitor agent behavior at scale."You should think of an agent as a human. Digital labor. You need to manage performance just like a human. And you need these audit trails," Tallapragada explained.The company encountered this challenge firsthand when its own agent deployment scaled. "When we started at Agentforce at Salesforce, we would track every message, which is great until 1,000, 3,000," Tallapragada said. "Once you have a million chats, there's no human, we cannot do it."The platform now includes "Agentforce Grid" for searching across millions of conversations to identify and fix problematic patterns. The company also introduced Agent Script, a new scripting language that allows developers to define precise guardrails and deterministic controls for agent behavior.Data infrastructure gets a major upgradeUnderlying the agent capabilities is significant infrastructure investment. Salesforce's Data 360 includes "Intelligent Context," which automatically extracts structured information from unstructured content like PDFs, diagrams, and flowcharts using what the company describes as "AI-powered unstructured data pipelines."The company is also collaborating with Databricks, dbt Labs, and Snowflake on the "Universal Semantic Interchange," an attempt to standardize how different platforms define business metrics. The pending $8 billion acquisition of Informatica, expected to close soon, will expand metadata management capabilities across the enterprise.The competitive landscape keeps intensifyingSalesforce's aggressive AI agent push comes as virtually every major enterprise software vendor pursues similar strategies. Microsoft has embedded Copilot across its product line, Google offers agent capabilities through Vertex AI and Gemini, and ServiceNow has launched its own agentic offerings.When asked how Salesforce's announcement compared to OpenAI's recent releases, Tallapragada emphasized that customers will use multiple AI tools simultaneously. "Most of the time I'm seeing they're using OpenAI, they're using Gemini, they're using Anthropic, just like Salesforce, we use all three," he said.The real differentiation, executives argued, lies not in the AI models but in the integration with business processes and data. Harris framed the competition in terms familiar from Salesforce's founding: "26 years ago, we just said, let's make Salesforce automation as easy as buying a book on Amazon.com. We're doing that same thing. We want to make agentic AI as easy as buying a book on Amazon."The company's customer success stories are impressive but remain a small fraction of its customer base. With 150,000 Salesforce customers and one million Slack customers, the 12,000 Agentforce deployments represent roughly 8% penetration — strong for a one-year-old product line, but hardly ubiquitous.The company's stock, down roughly 28% year to date with a Relative Strength rating of just 15, suggests investors remain skeptical. This week's Dreamforce demonstrations — and the months of customer deployments that follow — will begin to provide answers to whether Salesforce can finally move enterprise AI from pilots to production at scale, or whether the "$7 billion business" remains more aspiration than reality.

#ai & ml #deep dive

The agentic AI landscape is exploding. Every new framework, demo, and announcement promises to let your AI assistant book flights, query databases, and manage calendars. This rapid advancement of capabilities is thrilling for users, but for the architects and engineers building these systems, it poses a fundamental question: When should a new capability be a […]

Large dataset handling in Python is not exempt from challenges like memory constraints and slow processing workflows.

AI 911 hoax, Nvidia CEO’s regret, tiny AI crushes giants, robot painter, and more...

Vast amounts of valuable research data remain unused, trapped in labs or lost to time. Frontiers aims to change that with FAIR² Data Management, a groundbreaking AI-driven system that makes datasets reusable, verifiable, and citable. By uniting curation, compliance, peer review, and interactive visualization in one platform, FAIR² empowers scientists to share their work responsibly and gain recognition.

#ai

Presented by SolidigmAs AI adoption surges, data centers face a critical bottleneck in storage — and traditional HDDs are at the center of it. Data that once sat idle as cold archives is now being pulled into frequent use to build more accurate models and deliver better inference results. This shift from cold data to warm data demands low-latency, high-throughput storage that can handle parallel computations. HDDs will remain the workhorse for low-cost cold storage, but without rethinking their role, the high-capacity storage layer risks becoming the weakest link in the AI factory."Modern AI workloads, combined with data center constraints, have created new challenges for HDDs," says Jeff Janukowicz, research vice president at IDC. "While HDD suppliers are addressing data storage growth by offering larger drives, this often comes at the expense of slower performance. As a result, the concept of 'nearline SSDs' is becoming an increasingly relevant topic of discussion within the industry." Today, AI operators need to maximize GPU utilization, manage network-attached storage efficiently, and scale compute — all while cutting costs on increasingly scarce power and space. In an environment where every watt and every square inch counts, says Roger Corell, senior director of AI and leadership marketing at Solidigm, success requires more than a technical refresh. It calls for a deeper realignment.“It speaks to the tectonic shift in the value of data for AI,” Corell says. “That’s where high-capacity SSDs come into play. Along with capacity, they bring performance and efficiency -- enabling exabyte-scale storage pipelines to keep pace with the relentless pace of data set size. All of that consumes power and space, so we need to do it as efficiently as possible to enable more GPU scale in this constrained environment.”High-capacity SSDs aren’t just displacing HDDs — they’re removing one of the biggest bottlenecks on the AI factory floor. By delivering massive gains in performance, efficiency, and density, SSDs free up the power and space needed to push GPU scale further. It’s less a storage upgrade than a structural shift in how data infrastructure is designed for the AI era.HDDs vs. SDDs: More than just a hardware refreshHDDs have impressive mechanical designs, but they're made up of many moving parts that at scale use more energy, take up more space, and fail at a higher rate than solid state drives. The reliance on spinning platters and mechanical read/write heads inherently limits Input/Output Operations Per Second (IOPS), creating bottlenecks for AI workloads that demand low latency, high concurrency, and sustained throughput. HDDs also struggle with latency-sensitive tasks, as the physical act of seeking data introduces mechanical delays unsuited for real-time AI inference and training. Moreover, their power and cooling requirements increase significantly under frequent and intensive data access, reducing efficiency as data scales and warms.In contrast, the SSD-based VAST storage solution reduces energy usage by ~$1M a year, and in an AI environment where every watt matters, this is a huge advantage for SSDs. To demonstrate, Solidigm and VAST Data completed a study examining the economics of data storage at exabyte scale — a quadrillion bytes, or a billion gigabytes, with an analysis of storage power consumption versus HDDs over a 10-year period. As a starting reference point, you’d need four 30TB HDDs to equal the capacity of a single 122TB Solidigm SSD. After factoring in VAST’s data reduction techniques made possible by the superior performance of SSDs, the exabyte solution comprises 3,738 Solidigm SSDs vs over 40,000 high-capacity HDDs. The study found that the SSD-based VAST solution consumes 77% less storage energy. Minimizing data center footprints"We’re shipping 122-terabyte drives to some of the top OEMs and leading AI cloud service providers in the world," Corell says. "When you compare an all-122TB SSD to hybrid HDD + TLC SSD configuration, they're getting a nine-to-one savings in data center footprint. And yes, it’s important in these massive data centers that are building their own nuclear reactors and signing hefty power purchase agreements with renewable energy providers, but it’s increasingly important as you get to the regional data centers, the local data centers, and all the way out to your edge deployments where space can come at a premium."That nine-to-one savings goes beyond space and power — it lets organizations fit infrastructure into previously unavailable spaces, expand GPU scale, or build smaller footprints."If you’re given X amount of land and Y amount of power, you’re going to use it. You’re AI" Corell explains, “where every watt and square inch counts, so why not use it in the most efficient way? Get the most efficient storage possible on the planet and enable greater GPU scale within that envelope that you have to fit in. On an ongoing basis, it’s going to save you operational cost as well. You have 90 percent fewer storage bays to maintain, and the cost associated with that is gone."Another often-overlooked element, the (much) larger physical footprint of data stored on mechanical HDDs results in a greater construction materials footprint. Collectively, concrete and steel production accounts for over 15% of global greenhouse gas emissions. By reducing the physical footprint of storage, high-capacity SSDs can help reduce embodied concrete and steel-based emissions by more than 80% compared to HDDs. And in the last phase of the sustainability life cycle, which is drive end-of-life, there will be 90% percent fewer drives to disposition. . Reshaping cold and archival storage strategiesThe move to SDD isn't just a storage upgrade; it's a fundamental realignment of data infrastructure strategy in the AI era, and it's picking up speed."Big hyperscalers are looking to wring the most out of their existing infrastructure, doing unnatural acts, if you will, with HDDs like overprovisioning them to near 90% to try to wring out as many IOPS per terabyte as possible, but they’re beginning to come around," Corell says. "Once they turn to a modern all high-capacity storage infrastructure, the industry at large will be on that trajectory. Plus, we're starting to see these lessons learned on the value of modern storage in AI applied to other segments as well, such as big data analytics, HPC, and many more."While all-flash solutions are being embraced almost universally, there will always be a place for HDDs, he adds. HDDs will persist in usages like archival, cold storage, and scenarios where pure cost per gigabyte concerns outweigh the need for real-time access. But as the token economy heats up and enterprises realize value in monetizing data, the warm and warming data segments will continue to grow. Solving power challenges of the futureNow in its 4th generation, with more than 122 cumulative exabytes shipped to date, Solidigm’s QLC (Quad-Level Cell) technology has led the industry in balancing higher drive capacities with cost efficiency."We don’t think of storage as just storing bits and bytes. We think about how we can develop these amazing drives that are able to deliver benefits at a solution level," Corell says. "The shining star on that is our recently launched, E1.S, designed specifically for dense and efficient storage in direct attach storage configurations for the next-generation fanless GPU server."The Solidigm D7-PS1010 E1.S is a breakthrough, the industry’s first eSSD with single-sided direct-to-chip liquid cooling technology. Solidigm worked with NVIDIA to address the dual challenges of heat management and cost efficiency, while delivering the high performance required for demanding AI workloads. "We’re rapidly moving to an environment where all critical IT components will be direct-to-chip liquid-cooled on the direct attach side," he says. "I think the market needs to be looking at their approach to cooling, because power limitations, power challenges are not going to abate in my lifetime, at least. They need to be applying a neocloud mindset to how they’re architecting the most efficient infrastructure."Increasingly complex inference is pushing against a memory wall, which makes storage architecture a front-line design challenge, not an afterthought. High-capacity SSDs, paired with liquid cooling and efficient design, are emerging as the only path to meet AI’s escalating demands. The mandate now is to build infrastructure not just for efficiency, but for storage that can efficiently scale as data grows. The organizations that realign storage now will be the ones able to scale AI tomorrow.Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

#ai #datadecisionmakers

Imagine you do two things on a Monday morning.First, you ask a chatbot to summarize your new emails. Next, you ask an AI tool to figure out why your top competitor grew so fast last quarter. The AI silently gets to work. It scours financial reports, news articles and social media sentiment. It cross-references that data with your internal sales numbers, drafts a strategy outlining three potential reasons for the competitor's success and schedules a 30-minute meeting with your team to present its findings.We're calling both of these "AI agents," but they represent worlds of difference in intelligence, capability and the level of trust we place in them. This ambiguity creates a fog that makes it difficult to build, evaluate, and safely govern these powerful new tools. If we can't agree on what we're building, how can we know when we've succeeded?This post won't try to sell you on yet another definitive framework. Instead, think of it as a survey of the current landscape of agent autonomy, a map to help us all navigate the terrain together.What are we even talking about? Defining an "AI agent"Before we can measure an agent's autonomy, we need to agree on what an "agent" actually is. The most widely accepted starting point comes from the foundational textbook on AI, Stuart Russell and Peter Norvig’s “Artificial Intelligence: A Modern Approach.” They define an agent as anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators. A thermostat is a simple agent: Its sensor perceives the room temperature, and its actuator acts by turning the heat on or off.ReAct Model for AI Agents (Credit: Confluent) That classic definition provides a solid mental model. For today's technology, we can translate it into four key components that make up a modern AI agent:Perception (the "senses"): This is how an agent takes in information about its digital or physical environment. It's the input stream that allows the agent to understand the current state of the world relevant to its task.Reasoning engine (the "brain"): This is the core logic that processes the perceptions and decides what to do next. For modern agents, this is typically powered by a large language model (LLM). The engine is responsible for planning, breaking down large goals into smaller steps, handling errors and choosing the right tools for the job.Action (the "hands"): This is how an agent affects its environment to move closer to its goal. The ability to take action via tools is what gives an agent its power.Goal/objective: This is the overarching task or purpose that guides all of the agent's actions. It is the "why" that turns a collection of tools into a purposeful system. The goal can be simple ("Find the best price for this book") or complex ("Launch the marketing campaign for our new product")Putting it all together, a true agent is a full-body system. The reasoning engine is the brain, but it’s useless without the senses (perception) to understand the world and the hands (actions) to change it. This complete system, all guided by a central goal, is what creates genuine agency.With these components in mind, the distinction we made earlier becomes clear. A standard chatbot isn't a true agent. It perceives your question and acts by providing an answer, but it lacks an overarching goal and the ability to use external tools to accomplish it.An agent, on the other hand, is software that has agency. It has the capacity to act independently and dynamically toward a goal. And it's this capacity that makes a discussion about the levels of autonomy so important.Learning from the past: How we learned to classify autonomyThe dizzying pace of AI can make it feel like we're navigating uncharted territory. But when it comes to classifying autonomy, we’re not starting from scratch. Other industries have been working on this problem for decades, and their playbooks offer powerful lessons for the world of AI agents.The core challenge is always the same: How do you create a clear, shared language for the gradual handover of responsibility from a human to a machine?SAE levels of driving automationPerhaps the most successful framework comes from the automotive industry. The SAE J3016 standard defines six levels of driving automation, from Level 0 (fully manual) to Level 5 (fully autonomous).The SAE J3016 Levels of Driving Automation (Credit: SAE International) What makes this model so effective isn't its technical detail, but its focus on two simple concepts:Dynamic driving task (DDT): This is everything involved in the real-time act of driving: steering, braking, accelerating and monitoring the road.Operational design domain (ODD): These are the specific conditions under which the system is designed to work. For example, "only on divided highways" or "only in clear weather during the daytime."The question for each level is simple: Who is doing the DDT, and what is the ODD? At Level 2, the human must supervise at all times. At Level 3, the car handles the DDT within its ODD, but the human must be ready to take over. At Level 4, the car can handle everything within its ODD, and if it encounters a problem, it can safely pull over on its own.The key insight for AI agents: A robust framework isn't about the sophistication of the AI "brain." It's about clearly defining the division of responsibility between human and machine under specific, well-defined conditions.Aviation's 10 Levels of AutomationWhile the SAE’s six levels are great for broad classification, aviation offers a more granular model for systems designed for close human-machine collaboration. The Parasuraman, Sheridan, and Wickens model proposes a detailed 10-level spectrum of automation.Levels of Automation of Decision and Action Selection for Aviation (Credit: The MITRE Corporation)This framework is less about full autonomy and more about the nuances of interaction. For example:At Level 3, the computer "narrows the selection down to a few" for the human to choose from.At Level 6, the computer "allows the human a restricted time to veto before it executes" an action.At Level 9, the computer "informs the human only if it, the computer, decides to."The key insight for AI agents: This model is perfect for describing the collaborative "centaur" systems we're seeing today. Most AI agents won't be fully autonomous (Level 10) but will exist somewhere on this spectrum, acting as a co-pilot that suggests, executes with approval or acts with a veto window.Robotics and unmanned systemsFinally, the world of robotics brings in another critical dimension: context. The National Institute of Standards and Technology's (NIST) Autonomy Levels for Unmanned Systems (ALFUS) framework was designed for systems like drones and industrial robots.The Three-Axis Model for ALFUS (Credit: NIST) Its main contribution is adding context to the definition of autonomy, assessing it along three axes:Human independence: How much human supervision is required?Mission complexity: How difficult or unstructured is the task?Environmental complexity: How predictable and stable is the environment in which the agent operates?The key insight for AI agents: This framework reminds us that autonomy isn't a single number. An agent performing a simple task in a stable, predictable digital environment (like sorting files in a single folder) is fundamentally less autonomous than an agent performing a complex task across the chaotic, unpredictable environment of the open internet, even if the level of human supervision is the same.The emerging frameworks for AI agentsHaving looked at the lessons from automotive, aviation and robotics, we can now examine the emerging frameworks designed for AI agents. While the field is still new and no single standard has won out, most proposals fall into three distinct, but often overlapping, categories based on the primary question they seek to answer.Category 1: The "What can it do?" frameworks (capability-focused)These frameworks classify agents based on their underlying technical architecture and what they are capable of achieving. They provide a roadmap for developers, outlining a progression of increasingly sophisticated technical milestones that often correspond directly to code patterns.A prime example of this developer-centric approach comes from Hugging Face. Their framework uses a star rating to show the gradual shift in control from human to AI:Five Levels of AI Agent Autonomy, as proposed by HuggingFace (Credit: Hugging Face)
Zero stars (simple processor): The AI has no impact on the program's flow. It simply processes information and its output is displayed, like a print statement. The human is in complete control.One star (router): The AI makes a basic decision that directs program flow, like choosing between two predefined paths (if/else). The human still defines how everything is done.Two stars (tool call): The AI chooses which predefined tool to use and what arguments to use with it. The human has defined the available tools, but the AI decides how to execute them.Three stars (multi-step agent): The AI now controls the iteration loop. It decides which tool to use, when to use it and whether to continue working on the task.Four stars (fully autonomous): The AI can generate and execute entirely new code to accomplish a goal, going beyond the predefined tools it was given.Strengths: This model is excellent for engineers. It's concrete, maps directly to code and clearly benchmarks the transfer of executive control to the AI. Weaknesses: It is highly technical and less intuitive for non-developers trying to understand an agent's real-world impact.Category 2: The "How do we work together?" frameworks (interaction-focused)This second category defines autonomy not by the agent’s internal skills, but by the nature of its relationship with the human user. The central question is: Who is in control, and how do we collaborate?This approach often mirrors the nuance we saw in the aviation models. For instance, a framework detailed in the paper Levels of Autonomy for AI Agents defines levels based on the user's role:L1 - user as an operator: The human is in direct control (like a person using Photoshop with AI-assist features).L4 - user as an approver: The agent proposes a full plan or action, and the human must give a simple "yes" or "no" before it proceeds.L5 - user as an observer: The agent has full autonomy to pursue a goal and simply reports its progress and results back to the human.Levels of Autonomy for AI AgentsStrengths: These frameworks are highly intuitive and user-centric. They directly address the critical issues of control, trust, and oversight.Weaknesses: An agent with simple capabilities and one with highly advanced reasoning could both fall into the "Approver" level, so this approach can sometimes obscure the underlying technical sophistication.Category 3: The "Who is responsible?" frameworks (governance-focused)The final category is less concerned with how an agent works and more with what happens when it fails. These frameworks are designed to help answer crucial questions about law, safety and ethics.Think tanks like Germany's Stiftung Neue VTrantwortung have analyzed AI agents through the lens of legal liability. Their work aims to classify agents in a way that helps regulators determine who is responsible for an agent's actions: The user who deployed it, the developer who built it or the company that owns the platform it runs on?This perspective is essential for navigating complex regulations like the EU's Artificial Intelligence Act, which will treat AI systems differently based on the level of risk they pose.Strengths: This approach is absolutely essential for real-world deployment. It forces the difficult but necessary conversations about accountability that build public trust.Weaknesses: It's more of a legal or policy guide than a technical roadmap for developers.A comprehensive understanding requires looking at all three questions at once: An agent's capabilities, how we interact with it and who is responsible for the outcome..Identifying the gaps and challengesLooking at the landscape of autonomy frameworks shows us that no  single model is sufficient because the true challenges lie in the gaps between them, in areas that are incredibly difficult to define and measure.What is the "Road" for a digital agent?The SAE framework for self-driving cars gave us the powerful concept of an ODD, the specific conditions under which a system can operate safely. For a car, that might be "divided highways, in clear weather, during the day." This is a great solution for a physical environment, but what’s the ODD for a digital agent?The "road" for an agent is the entire internet. An infinite, chaotic and constantly changing environment. Websites get redesigned overnight, APIs are deprecated and social norms in online communities shift. How do we define a "safe" operational boundary for an agent that can browse websites, access databases and interact with third-party services? Answering this is one of the biggest unsolved problems. Without a clear digital ODD, we can't make the same safety guarantees that are becoming standard in the automotive world.This is why, for now, the most effective and reliable agents operate within well-defined, closed-world scenarios. As I argued in a recent VentureBeat article, forgetting the open-world fantasies and focusing on "bounded problems" is the key to real-world success. This means defining a clear, limited set of tools, data sources and potential actions. Beyond simple tool useToday's agents are getting very good at executing straightforward plans. If you tell one to "find the price of this item using Tool A, then book a meeting with Tool B," it can often succeed. But true autonomy requires much more. Many systems today hit a technical wall when faced with tasks that require:Long-term reasoning and planning: Agents struggle to create and adapt complex, multi-step plans in the face of uncertainty. They can follow a recipe, but they can't yet invent one from scratch when things go wrong.Robust self-correction: What happens when an API call fails or a website returns an unexpected error? A truly autonomous agent needs the resilience to diagnose the problem, form a new hypothesis and try a different approach, all without a human stepping in.Composability: The future likely involves not one agent, but a team of specialized agents working together. Getting them to collaborate reliably, to pass information back and forth, delegate tasks and resolve conflicts is a monumental software engineering challenge that we are just beginning to tackle.The elephant in the room: Alignment and controlThis is the most critical challenge of all, because it's not just technical, it's deeply human. Alignment is the problem of ensuring an agent's goals and actions are consistent with our intentions and values, even when those values are complex, unstated or nuanced.Imagine you give an agent the seemingly harmless goal of "maximizing customer engagement for our new product." The agent might correctly determine that the most effective strategy is to send a dozen notifications a day to every user. The agent has achieved its literal goal perfectly, but it has violated the unstated, common-sense goal of "don't be incredibly annoying."This is a failure of alignment.The core difficulty, which organizations like the AI Alignment Forum are dedicated to studying, is that it is incredibly hard to specify fuzzy, complex human preferences in the precise, literal language of code. As agents become more powerful, ensuring they are not just capable but also safe, predictable and aligned with our true intent becomes the most important challenge we face.The future is agentic (and collaborative)The path forward for AI agents is not a single leap to a god-like super-intelligence, but a more practical and collaborative journey. The immense challenges of open-world reasoning and perfect alignment mean that the future is a team effort.We will see less of the single, all-powerful agent and more of an "agentic mesh" — a network of specialized agents, each operating within a bounded domain, working together to tackle complex problems. More importantly, they will work with us. The most valuable and safest applications will keep a human on the loop, casting them as a co-pilot or strategist to augment our intellect with the speed of machine execution. This "centaur" model will be the most effective and responsible path forward.The frameworks we've explored aren’t just theoretical. They’re practical tools for building trust, assigning responsibility and setting clear expectations. They help developers define limits and leaders shape vision, laying the groundwork for AI to become a dependable partner in our work and lives.Sean Falconer is Confluent's AI entrepreneur in residence.

Major AI superhero ban, AI job = film chores, robot butler, Altman dilemma, and more...

#ai #datadecisionmakers

Your best data science team just spent six months building a model that predicts customer churn with 90% accuracy. It’s sitting on a server, unused. Why? Because it’s been stuck in a risk review queue for a very long period of time, waiting for a committee that doesn’t understand stochastic models to sign off. This isn’t a hypothetical — it’s the daily reality in most large companies.

In AI, the models move at internet speed. Enterprises don’t.

Every few weeks, a new model family drops, open-source toolchains mutate and entire MLOps practices get rewritten. But in most companies, anything touching production AI has to pass through risk reviews, audit trails, change-management boards and model-risk sign-off. The result is a widening velocity gap: The research community accelerates; the enterprise stalls.

This gap isn’t a headline problem like “AI will take your job.” It’s quieter and more expensive: missed productivity, shadow AI sprawl, duplicated spend and compliance drag that turns promising pilots into perpetual proofs-of-concept.The numbers say the quiet part out loudTwo trends collide. First, the pace of innovation: Industry is now the dominant force, producing the vast majority of notable AI models, according to Stanford's 2024 AI Index Report. The core inputs for this innovation are compounding at a historic rate, with training compute needs doubling rapidly every few years. That pace all but guarantees rapid model churn and tool fragmentation.

Second, enterprise adoption is accelerating. According to IBM's, 42% of enterprise-scale companies have actively deployed AI, with many more actively exploring it. Yet the same surveys show governance roles are only now being formalized, leaving many companies to retrofit control after deployment.

Layer on new regulation. The EU AI Act’s staged obligations are locked in — unacceptable-risk bans are already active and General Purpose AI (GPAI) transparency duties hit in mid-2025, with high-risk rules following. Brussels has made clear there’s no pause coming. If your governance isn’t ready, your roadmap will be.The real blocker isn't modeling, it's auditIn most enterprises, the slowest step isn’t fine-tuning a model; it’s proving your model follows certain guidelines.

Three frictions dominate:Audit debt: Policies were written for static software, not stochastic models. You can ship a microservice with unit tests; you can’t “unit test” fairness drift without data access, lineage and ongoing monitoring. When controls don’t map, reviews balloon.. MRM overload: Model risk management (MRM), a discipline perfected in banking, is spreading beyond finance — often translated literally, not functionally. Explainability and data-governance checks make sense; forcing every retrieval-augmented chatbot through credit-risk style documentation does not.Shadow AI sprawl: Teams adopt vertical AI inside SaaS tools without central oversight. It feels fast — until the third audit asks who owns the prompts, where embeddings live and how to revoke data. Sprawl is speed’s illusion; integration and governance are the long-term velocity.Frameworks exist, but they're not operational by defaultThe NIST AI Risk Management Framework is a solid north star: govern, map, measure, manage. It’s voluntary, adaptable and aligned with international standards. But it’s a blueprint, not a building. Companies still need concrete control catalogs, evidence templates and tooling that turn principles into repeatable reviews.

Similarly, the EU AI Act sets deadlines and duties. It doesn’t install your model registry, wire your dataset lineage or resolve the age-old question of who signs off when accuracy and bias trade off. That’s on you soon.What winning enterprises are doing differentlyThe leaders I see closing the velocity gap aren’t chasing every model; they’re making the path to production routine. Five moves show up again and again:Ship a control plane, not a memo: Codify governance as code. Create a small library or service that enforces non-negotiables: Dataset lineage required, evaluation suite attached, risk tier chosen, PII scan passed, human-in-the-loop defined (if required). If a project can’t satisfy the checks, it can’t deploy.Pre-approve patterns: Approve reference architectures — “GPAI with retrieval augmented generation (RAG) on approved vector store,” “high-risk tabular model with feature store X and bias audit Y,” “vendor LLM via API with no data retention.” Pre-approval shifts review from bespoke debates to pattern conformance. (Your auditors will thank you.)Stage your governance by risk, not by team: Tie review depth to use-case criticality (safety, finance, regulated outcomes). A marketing copy assistant shouldn’t endure the same gauntlet as a loan adjudicator. Risk-proportionate review is both defensible and fast.Create an “evidence once, reuse everywhere” backbone: Centralize model cards, eval results, data sheets, prompt templates and vendor attestations. Every subsequent audit should start at 60% done because you’ve already proven the common pieces.Make audit a product: Give legal, risk and compliance a real roadmap. Instrument dashboards that show: Models in production by risk tier, upcoming re-evals, incidents and data-retention attestations. If audit can self-serve, engineering can ship.A pragmatic cadence for the next 12 monthsIf you’re serious about catching up, pick a 12-month governance sprint:Quarter 1: Stand up a minimal AI registry (models, datasets, prompts, evaluations). Draft risk-tiering and control mapping aligned to NIST AI RMF functions; publish two pre-approved patterns.Quarter 2: Turn controls into pipelines (CI checks for evals, data scans, model cards). Convert two fast-moving teams from shadow AI to platform AI by making the paved road easier than the side road.Quarter 3: Pilot a GxP-style review (a rigorous documentation standard from life sciences) for one high-risk use case; automate evidence capture. Start your EU AI Act gap analysis if you touch Europe; assign owners and deadlines.Quarter 4: Expand your pattern catalog (RAG, batch inference, streaming prediction). Roll out dashboards for risk/compliance. Bake governance SLAs into your OKRs.

By this point, you haven’t slowed down innovation — you’ve standardized it. The research community can keep moving at light speed; you can keep shipping at enterprise speed — without the audit queue becoming your critical path.The competitive edge isn't the next model — it's the next mileIt’s tempting to chase each week’s leaderboard. But the durable advantage is the mile between a paper and production: The platform, the patterns, the proofs. That’s what your competitors can’t copy from GitHub, and it’s the only way to keep velocity without trading compliance for chaos.

In other words: Make governance the grease, not the grit.

Jayachander Reddy Kandakatla is senior machine learning operations (MLOps) engineer at Ford Motor Credit Company.

Before we begin, let's make sure you're in the right place.

#ai #datadecisionmakers

AI tools are revolutionizing software development by automating repetitive tasks, refactoring bloated code, and identifying bugs in real-time. Developers can now generate well-structured code from plain language prompts, saving hours of manual effort. These tools learn from vast codebases, offering context-aware recommendations that enhance productivity and reduce errors. Rather than starting from scratch, engineers can prototype quickly, iterate faster and focus on solving increasingly complex problems.As code generation tools grow in popularity, they raise questions about the future size and structure of engineering teams. Earlier this year, Garry Tan, CEO of startup accelerator Y Combinator, noted that about one-quarter of its current clients use AI to write 95% or more of their software. In an interview with CNBC, Tan said: “What that means for founders is that you don’t need a team of 50 or 100 engineers, you don’t have to raise as much. The capital goes much longer.”AI-powered coding may offer a fast solution for businesses under budget pressure — but its long-term effects on the field and labor pool cannot be ignored.As AI-powered coding rises, human expertise may diminish
In the era of AI, the traditional journey to coding expertise that has long supported senior developers may be at risk. Easy access to large language models (LLMs) enables junior coders to quickly identify issues in code. While this speeds up software development, it can distance developers from their own work, delaying the growth of core problem-solving skills. As a result, they may avoid the focused, sometimes uncomfortable hours required to build expertise and progress on the path to becoming successful senior developers.Consider Anthropic’s Claude Code, a terminal-based assistant built on the Claude 3.7 Sonnet model, which automates bug detection and resolution, test creation and code refactoring. Using natural language commands, it reduces repetitive manual work and boosts productivity.Microsoft has also released two open-source frameworks — AutoGen and Semantic Kernel — to support the development of agentic AI systems. AutoGen enables asynchronous messaging, modular components, and distributed agent collaboration to build complex workflows with minimal human input. Semantic Kernel is an SDK that integrates LLMs with languages like C#, Python and Java, letting developers build AI agents to automate tasks and manage enterprise applications.The increasing availability of these tools from Anthropic, Microsoft and others may reduce opportunities for coders to refine and deepen their skills. Rather than “banging their heads against the wall” to debug a few lines or select a library to unlock new features, junior developers may simply turn to AI for an assist. This means senior coders with problem-solving skills honed over decades may become an endangered species.Overreliance on AI for writing code risks weakening developers’ hands-on experience and understanding of key programming concepts. Without regular practice, they may struggle to independently debug, optimize or design systems. Ultimately, this erosion of skill can undermine critical thinking, creativity and adaptability — qualities that are essential not just for coding, but for assessing the quality and logic of AI-generated solutions.AI as mentor: Turning code automation into hands-on learningWhile concerns about AI diminishing human developer skills are valid, businesses shouldn’t dismiss AI-supported coding. They just need to think carefully about when and how to deploy AI tools in development. These tools can be more than productivity boosters; they can act as interactive mentors, guiding coders in real time with explanations, alternatives and best practices.When used as a training tool, AI can reinforce learning by showing coders why code is broken and how to fix it—rather than simply applying a solution. For example, a junior developer using Claude Code might receive immediate feedback on inefficient syntax or logic errors, along with suggestions linked to detailed explanations. This enables active learning, not passive correction. It’s a win-win: Accelerating project timelines without doing all the work for junior coders.Additionally, coding frameworks can support experimentation by letting developers prototype agent workflows or integrate LLMs without needing expert-level knowledge upfront. By observing how AI builds and refines code, junior developers who actively engage with these tools can internalize patterns, architectural decisions and debugging strategies — mirroring the traditional learning process of trial and error, code reviews and mentorship.However, AI coding assistants shouldn’t replace real mentorship or pair programming. Pull requests and formal code reviews remain essential for guiding newer, less experienced team members. We are nowhere near the point at which AI can single-handedly upskill a junior developer.Companies and educators can build structured development programs around these tools that emphasize code comprehension to ensure AI is used as a training partner rather than a crutch. This encourages coders to question AI outputs and requires manual refactoring exercises. In this way, AI becomes less of a replacement for human ingenuity and more of a catalyst for accelerated, experiential learning.Bridging the gap between automation and educationWhen utilized with intention, AI doesn’t just write code; it teaches coding, blending automation with education to prepare developers for a future where deep understanding and adaptability remain indispensable.By embracing AI as a mentor, as a programming partner and as a team of developers we can direct to the problem at hand, we can bridge the gap between effective automation and education. We can empower developers to grow alongside the tools they use. We can ensure that, as AI evolves, so too does the human skill set, fostering a generation of coders who are both efficient and deeply knowledgeable.Richard Sonnenblick is chief data scientist at Planview.

#ai

How a semiconductor veteran turned over a century of horticultural wisdom into AI-led competitive advantage For decades, a ritual played out across ScottsMiracle-Gro’s media facilities. Every few weeks, workers walked acres of towering compost and wood chip piles with nothing more than measuring sticks. They wrapped rulers around each mound, estimated height, and did what company President Nate Baxter now describes as “sixth-grade geometry to figure out volume.”Today, drones glide over those same plants with mechanical precision. Vision systems calculate volumes in real time. The move from measuring sticks to artificial intelligence signals more than efficiency. It is the visible proof of one of corporate America’s most unlikely technology stories.The AI revolution finds an unexpected leaderEnterprise AI has been led by predictable players. Software companies with cloud-native architectures. Financial services firms with vast data lakes. Retailers with rich digital touchpoints. Consumer packaged goods companies that handle physical products like fertilizer and soil were not expected to lead.Yet ScottsMiracle-Gro has realized more than half of a targeted $150 million in supply chain savings. It reports a 90 percent improvement in customer service response times. Its predictive models enable weekly reallocation of marketing resources across regional markets.A Silicon Valley veteran bets on soil scienceBaxter’s path to ScottsMiracle-Gro (SMG) reads like a calculated pivot, not a corporate rescue. After two decades in semiconductor manufacturing at Intel and Tokyo Electron, he knew how to apply advanced technology to complex operations.“I sort of initially said, ‘Why would I do this? I’m running a tech company. It’s an industry I’ve been in for 25 years,’” Baxter recalls of his reaction when ScottsMiracle-Gro CEO Jim Hagedorn approached him in 2023. The company was reeling from a collapsed $1.2 billion hydroponics investment and facing what he describes as “pressure from a leverage standpoint.”His wife challenged him with a direct prompt. If you are not learning or putting yourself in uncomfortable situations, you should change that.Baxter saw clear parallels between semiconductor manufacturing and SMG’s operations. Both require precision, quality control, and the optimization of complex systems. He also saw untapped potential in SMG’s domain knowledge. One hundred fifty years of horticultural expertise, regulatory know-how, and customer insight had never been fully digitized.“It became apparent to me whether it was on the backend with data analytics, business process transformation, and obviously now with AI being front and center of the consumer experience, a lot of opportunities are there,” he explains.The declaration that changed everythingThe pivot began at an all-hands meeting. “I just said, you know, guys, we’re a tech company. You just don’t know it yet,” Baxter recalls. “There’s so much opportunity here to drive this company to where it needs to go.”The first challenge was organizational. SMG had evolved into functional silos. IT, supply chain, and brand teams ran independent systems with little coordination. Drawing on his experience with complex technology organizations, Baxter restructured the consumer business into three business units. General managers became accountable not just for financial results but also for technology implementation within their domains.“I came in and said, we’re going to create new business units,” he explains. “The buck stops with you and I’m holding you accountable not only for the business results, for the quality of the creative and marketing, but for the implementation of technology.”To support the new structure, SMG set up centers of excellence for digital capabilities, insights and analytics, and creative functions. The hybrid design placed centralized expertise behind distributed accountability.Mining corporate memory for AI goldTurning legacy knowledge into machine-ready intelligence required what Fausto Fleites, VP of Data Intelligence, calls “archaeological work.” The team excavated decades of business logic embedded in legacy SAP systems and converted filing cabinets of research into AI-ready datasets. Fleites, a Cuban immigrant with a doctorate from FIU who led Florida’s public hurricane loss model before roles at Sears and Cemex, understood the stakes.“The costly part of the migration was the business reporting layer we have in SAP Business Warehouse,” Fleites explains. “You need to uncover business logic created in many cases over decades.”SMG chose Databricks as its unified data platform. The team had Apache Spark expertise. Databricks offered strong SAP integration and aligned with a preference for open-source technologies that minimize vendor lock-in.The breakthrough came through systematic knowledge management. SMG built an AI bot using Google’s Gemini large language model to catalog and clean internal repositories. The system identified duplicates, grouped content by topic, and restructured information for AI consumption. The effort reduced knowledge articles by 30 percent while increasing their utility.“We used Gemini LLMs to actually categorize them into topics, find similar documents,” Fleites explains. A hybrid approach that combined modern AI with techniques like cosine similarity became the foundation for later applications.Building AI systems that actually understand fertilizerEarly trials with off-the-shelf AI exposed a real risk. General-purpose models confused products designed for killing weeds with those for preventing them. That mistake can ruin a lawn.“Different products, if you use one in the wrong place, would actually have a very negative outcome,” Fleites notes. “But those are kind of synonyms in certain contexts to the LLM. So they were recommending the wrong products.”The solution was a new architecture. SMG created what Fleites calls a “hierarchy of agents.” A supervisor agent routes queries to specialized worker agents organized by brand. Each agent draws on deep product knowledge encoded from a 400-page internal training manual.The system also changes the conversation. When users ask for recommendations, the agents start with questions about location, goals, and lawn conditions. They narrow possibilities step by step before offering suggestions. The stack integrates with APIs for product availability and state-specific regulatory compliance.From drones to demand forecasting across the enterpriseThe transformation runs across the company. Drones measure inventory piles. Demand forecasting models analyze more than 60 factors, including weather patterns, consumer sentiment, and macroeconomic indicators.These predictions enable faster moves. When drought struck Texas, the models supported a shift in promotional spending to regions with favorable weather. The reallocation helped drive positive quarterly results.“We not only have the ability to move marketing and promotion dollars around, but we’ve even gotten to the point where if it’s going to be a big weekend in the Northeast, we’ll shift our field sales resources from other regions up there,” Baxter explains.Consumer Services changed as well. AI agents now process incoming emails through Salesforce, draft responses based on the knowledge base, and flag them for brief human review. Draft times dropped from ten minutes to seconds and response quality improved.The company emphasizes explainable AI. Using SHAP, SMG built dashboards that decompose each forecast and show how weather, promotions, or media spending contribute to predictions.“Typically, if you open a prediction to a business person and you don’t say why, they’ll say, ‘I don’t believe you,’” Fleites explains. Transparency made it possible to move resource allocation from quarterly to weekly cycles.Competing like a startupSMG’s results challenge assumptions about AI readiness in traditional industries. The advantage does not come from owning the most sophisticated models. It comes from combining general-purpose AI with unique, structured domain knowledge.“LLMs are going to be a commodity,” Fleites observes. “The strategic differentiator is what is the additional level of [internal] knowledge we can fit to them.”Partnerships are central. SMG works with Google Vertex AI for foundational models, Sierra.ai for production-ready conversational agents, and Kindwise for computer vision. The ecosystem approach lets a small internal team recruited from Meta, Google, and AI startups deliver outsized impact without building everything from scratch.Talent follows impact. Conventional wisdom says traditional companies cannot compete with Meta salaries or Google stock. SMG offered something different. It offered the chance to build transformative AI applications with immediate business impact.“When we have these interviews, what we propose to them is basically the ability to have real value with the latest knowledge in these spaces,” Fleites explains. “A lot of people feel motivated to come to us” because much of big tech AI work, despite the hype, “doesn’t really have an impact.”Team design mirrors that philosophy. “My direct reports are leaders and not only manage people, but are technically savvy,” Fleites notes. “We always are constantly switching hands between developing or maintaining a solution versus strategy versus managing people.” He still writes code weekly. The small team of 15 to 20 AI and engineering professionals stays lean by contracting out implementation while keeping “the know-how and the direction and the architecture” in-house.When innovation meets immovable objectsNot every pilot succeeded. SMG tested semi-autonomous forklifts in a 1.3 million square foot distribution facility. Remote drivers in the Philippines controlled up to five vehicles at once with strong safety records.“The technology was actually really great,” Baxter acknowledges. The vehicles could not lift enough weight for SMG’s heavy products. The company paused implementation.“Not everything we’ve tried has gone smoothly,” Baxter admits. “But I think another important point is you have to focus on a few critical ones and you have to know when something isn’t going to work and readjust.”The lesson tracks with semiconductor discipline. Investments must show measurable returns within set timeframes. Regulatory complexity adds difficulty. Products must comply with EPA rules and a patchwork of state restrictions, which AI systems must navigate correctly.The gardening sommelier and agent-to-agent futuresThe roadmap reflects a long-term view. SMG plans a “gardening sommelier” mobile app in 2026 that identifies plants, weeds, and lawn problems from photos and provides instant guidance. A beta already helps field sales teams answer complex product questions by querying the 400-page knowledge base.The company is exploring agent-to-agent communication so its specialized AI can interface with retail partners’ systems. A customer who asks a Walmart chatbot for lawn advice could trigger an SMG query that returns accurate, regulation-compliant recommendations.SMG has launched AI-powered search on its website, replacing keyword systems with conversational engines based on the internal stack. The future vision pairs predictive models with conversational agents so the system can reach out when conditions suggest a customer may need help.What traditional industries can learnScottsMiracle-Gro's transformation offers a clear playbook for enterprises. The advantage doesn't come from deploying the most sophisticated models. Instead, it comes from combining AI with proprietary domain knowledge that competitors can't easily replicate.By making general managers responsible for both business results and technology implementation, SMG ensured AI wasn't just an IT initiative but a business imperative. The 150 years of horticultural expertise only became valuable when it was digitized, structured, and made accessible to AI systems.Legacy companies competing for AI engineers can't match Silicon Valley compensation packages. But they can offer something tech giants often can't: immediate, measurable impact. When engineers see their weather forecasting models directly influence quarterly results or their agent architecture prevent customers from ruining their lawns, the work carries weight that another incremental improvement to an ad algorithm never will.“We have a right to win,” Baxter says. “We have 150 years of this experience.” That experience is now data, and data is the company’s competitive edge. ScottsMiracle-Gro didn’t outspend its rivals or chase the newest AI model. It turned knowledge into an operating system for growth. For a company built on soil, its biggest breakthrough might be cultivating data.

Never forget again, voice typing, Oxford AI guide, 40 jobs at risk, and more...

See the results of comparing speed and memory efficiency of DuckDB, SQLite, and Pandas on a million-row dataset.

#agentic ai #artificial intelligence #data engineering #data science #science and technology

What's happening—and what's next— for data and AI at the close of 2025.
The post 10 Data + AI Observations for Fall 2025 appeared first on Towards Data Science.

#artificial intelligence #deep dives #deep learning

Explaining "MineWorld: A real-time and open-source interactive world model on Minecraft" in simple terms.
The post Dreaming in Blocks — MineWorld, the Minecraft World Model appeared first on Towards Data Science.

#ai

Enterprises expanding AI deployments are hitting an invisible performance wall. The culprit? Static speculators that can't keep up with shifting workloads.Speculators are smaller AI models that work alongside large language models during inference. They draft multiple tokens ahead, which the main model then verifies in parallel. This technique (called speculative decoding) has become essential for enterprises trying to reduce inference costs and latency. Instead of generating tokens one at a time, the system can accept multiple tokens at once, dramatically improving throughput.Together AI today announced research and a new system called ATLAS (AdapTive-LeArning Speculator System) that aims to help enterprises overcome the challenge of static speculators. The technique provides a self-learning inference optimization capability that can help to deliver up to 400% faster inference performance than a baseline level of performance available in existing inference technologies such as vLLM.. The system addresses a critical problem: as AI workloads evolve, inference speeds degrade, even with specialized speculators in place.The company which got its start in 2023, has been focused on optimizing inference on its enterprise AI platform. Earlier this year the company raised $305 million as customer adoption and demand has grown."Companies we work with generally, as they scale up, they see shifting workloads, and then they don't see as much speedup from speculative execution as before," Tri Dao, chief scientist at Together AI, told VentureBeat in an exclusive interview. "These speculators generally don't work well when their workload domain starts to shift."The workload drift problem no one talks aboutMost speculators in production today are "static" models. They're trained once on a fixed dataset representing expected workloads, then deployed without any ability to adapt. Companies like Meta and Mistral ship pre-trained speculators alongside their main models. Inference platforms like vLLM use these static speculators to boost throughput without changing output quality.But there's a catch. When an enterprise's AI usage evolves the static speculator's accuracy plummets."If you're a company producing coding agents, and most of your developers have been writing in Python, all of a sudden some of them switch to writing Rust or C, then you see the speed starts to go down," Dao explained. "The speculator has a mismatch between what it was trained on versus what the actual workload is."This workload drift represents a hidden tax on scaling AI. Enterprises either accept degraded performance or invest in retraining custom speculators. That process captures only a snapshot in time and quickly becomes outdated.How adaptive speculators work: A dual-model approachATLAS uses a dual-speculator architecture that combines stability with adaptation:The static speculator - A heavyweight model trained on broad data provides consistent baseline performance. It serves as a "speed floor."The adaptive speculator - A lightweight model learns continuously from live traffic. It specializes on-the-fly to emerging domains and usage patterns.The confidence-aware controller - An orchestration layer dynamically chooses which speculator to use. It adjusts the speculation "lookahead" based on confidence scores."Before the adaptive speculator learns anything, we still have the static speculator to help provide the speed boost in the beginning," Ben Athiwaratkun, staff AI scientist at Together AI explained to VentureBeat. "Once the adaptive speculator becomes more confident, then the speed grows over time."The technical innovation lies in balancing acceptance rate (how often the target model agrees with drafted tokens) and draft latency. As the adaptive model learns from traffic patterns, the controller relies more on the lightweight speculator and extends lookahead. This compounds performance gains.Users don't need to tune any parameters. "On the user side, users don't have to turn any knobs," Dao said. "On our side, we have turned these knobs for users to adjust in a configuration that gets good speedup."Performance that rivals custom siliconTogether AI's testing shows ATLAS reaching 500 tokens per second on DeepSeek-V3.1 when fully adapted. More impressively, those numbers on Nvidia B200 GPUs match or exceed specialized inference chips like Groq's custom hardware."The software and algorithmic improvement is able to close the gap with really specialized hardware," Dao said. "We were seeing 500 tokens per second on these huge models that are even faster than some of the customized chips."The 400% speedup that the company claims for inference represents the cumulative effect of Together's Turbo optimization suite. FP4 quantization delivers 80% speedup over FP8 baseline. The static Turbo Speculator adds another 80-100% gain. The adaptive system layers on top. Each optimization compounds the benefits of the others.Compared to standard inference engines like vLLM or Nvidia's TensorRT-LLM, the improvement is substantial. Together AI benchmarks against the stronger baseline between the two for each workload before applying speculative optimizations.The memory-compute tradeoff explainedThe performance gains stem from exploiting a fundamental inefficiency in modern inference: wasted compute capacity.Dao explained that typically during inference, much of the compute power is not fully utilized."During inference, which is actually the dominant workload nowadays, you're mostly using the memory subsystem," he said.Speculative decoding trades idle compute for reduced memory access. When a model generates one token at a time, it's memory-bound. The GPU sits idle while waiting for memory. But when the speculator proposes five tokens and the target model verifies them simultaneously, compute utilization spikes while memory access remains roughly constant."The total amount of compute to generate five tokens is the same, but you only had to access memory once, instead of five times," Dao said.Think of it as intelligent caching for AIFor infrastructure teams familiar with traditional database optimization, adaptive speculators function like an intelligent caching layer, but with a crucial difference.Traditional caching systems like Redis or memcached require exact matches. You store the exact same query result and retrieve it when that specific query runs again. Adaptive speculators work differently."You can view it as an intelligent way of caching, not storing exactly, but figuring out some patterns that you see," Dao explained. "Broadly, we're observing that you're working with similar code, or working with similar, you know, controlling compute in a similar way. We can then predict what the big model is going to say. We just get better and better at predicting that."Rather than storing exact responses, the system learns patterns in how the model generates tokens. It recognizes that if you're editing Python files in a specific codebase, certain token sequences become more likely. The speculator adapts to those patterns, improving its predictions over time without requiring identical inputs.Use cases: RL training and evolving workloadsTwo enterprise scenarios particularly benefit from adaptive speculators:Reinforcement learning training: Static speculators quickly fall out of alignment as the policy evolves during training. ATLAS adapts continuously to the shifting policy distribution.Evolving workloads: As enterprises discover new AI use cases, workload composition shifts. "Maybe they started using AI for chatbots, but then they realized, hey, it can write code, so they start shifting to code," Dao said. "Or they realize these AIs can actually call tools and control computers and do accounting and things like that."In a vibe-coding session, the adaptive system can specialize for the specific codebase being edited. These are files not seen during training. This further increases acceptance rates and decoding speed.What it means for enterprises and the inference ecosystemATLAS is available now on Together AI's dedicated endpoints as part of the platform at no additional cost. The company's 800,000-plus developers (up from 450,000 in February) have access to the optimization.But the broader implications extend beyond one vendor's product. The shift from static to adaptive optimization represents a fundamental rethinking of how inference platforms should work. As enterprises deploy AI across multiple domains, the industry will need to move beyond one-time trained models toward systems that learn and improve continuously.Together AI has historically released some of its research techniques as open source and collaborated with projects like vLLM. While the fully integrated ATLAS system is proprietary, some of the underlying techniques may eventually influence the broader inference ecosystem. For enterprises looking to lead in AI, the message is clear: adaptive algorithms on commodity hardware can match custom silicon at a fraction of the cost. As this approach matures across the industry, software optimization increasingly trumps specialized hardware.

#ai & ml #events

A common misconception about O’Reilly is that we cater only to the deeply technical learner. While we’re proud of our deep roots in the tech community, the breadth of our offerings, both in books and on our learning platform, has always aimed to reach a broader audience of tech-adjacent and tech-curious people who want to […]

Agentic artificial intelligence (AI) represents the most significant shift in machine learning since deep learning transformed the field.

#business #business / artificial intelligence

Mark Zuckerberg's metaverse chief is urging employees to adopt AI across every workflow as part of a broader shift inside the company.

#ai

It seems like almost every week for the last two years since ChatGPT launched, new large language models (LLMs) from rival labs or from OpenAI itself have been released. Enterprises are hard pressed to keep up with the massive pace of change, let alone understand how to adapt to it — which of these new models should they adopt, if any, to power their workflows and the custom AI agents they're building to carry them out? Help has arrived: AI applications observability startup Raindrop has launched Experiments, a new analytics feature that the company describes as the first A/B testing suite designed specifically for enterprise AI agents — allowing companies to see and compare how updating agents to new underlying models, or changing their instructions and tool access, will impact their performance with real end users. The release extends Raindrop’s existing observability tools, giving developers and teams a way to see how their agents behave and evolve in real-world conditions.With Experiments, teams can track how changes — such as a new tool, prompt, model update, or full pipeline refactor — affect AI performance across millions of user interactions. The new feature is available now for users on Raindrop’s Pro subscription plan ($350 monthly) at raindrop.ai. A Data-Driven Lens on Agent DevelopmentRaindrop co-founder and chief technology officer Ben Hylak noted in a product announcement video (above) that Experiments helps teams see “how literally anything changed,” including tool usage, user intents, and issue rates, and to explore differences by demographic factors such as language. The goal is to make model iteration more transparent and measurable.The Experiments interface presents results visually, showing when an experiment performs better or worse than its baseline. Increases in negative signals might indicate higher task failure or partial code output, while improvements in positive signals could reflect more complete responses or better user experiences.By making this data easy to interpret, Raindrop encourages AI teams to approach agent iteration with the same rigor as modern software deployment—tracking outcomes, sharing insights, and addressing regressions before they compound.Background: From AI Observability to ExperimentationRaindrop’s launch of Experiments builds on the company’s foundation as one of the first AI-native observability platforms, designed to help enterprises monitor and understand how their generative AI systems behave in production. As VentureBeat reported earlier this year, the company — originally known as Dawn AI — emerged to address what Hylak, a former Apple human interface designer, called the “black box problem” of AI performance, helping teams catch failures “as they happen and explain to enterprises what went wrong and why." At the time, Hylak described how “AI products fail constantly—in ways both hilarious and terrifying,” noting that unlike traditional software, which throws clear exceptions, “AI products fail silently.” Raindrop’s original platform focused on detecting those silent failures by analyzing signals such as user feedback, task failures, refusals, and other conversational anomalies across millions of daily events.The company’s co-founders— Hylak, Alexis Gauba, and Zubin Singh Koticha — built Raindrop after encountering firsthand the difficulty of debugging AI systems in production. “We started by building AI products, not infrastructure,” Hylak told VentureBeat. “But pretty quickly, we saw that to grow anything serious, we needed tooling to understand AI behavior—and that tooling didn’t exist.”With Experiments, Raindrop extends that same mission from detecting failures to measuring improvements. The new tool transforms observability data into actionable comparisons, letting enterprises test whether changes to their models, prompts, or pipelines actually make their AI agents better—or just different.Solving the “Evals Pass, Agents Fail” ProblemTraditional evaluation frameworks, while useful for benchmarking, rarely capture the unpredictable behavior of AI agents operating in dynamic environments. As Raindrop co-founder Alexis Gauba explained in her LinkedIn announcement, “Traditional evals don’t really answer this question. They’re great unit tests, but you can’t predict your user’s actions and your agent is running for hours, calling hundreds of tools.”Gauba said the company consistently heard a common frustration from teams: “Evals pass, agents fail.”Experiments is meant to close that gap by showing what actually changes when developers ship updates to their systems. The tool enables side-by-side comparisons of models, tools, intents, or properties, surfacing measurable differences in behavior and performance.Designed for Real-World AI BehaviorIn the announcement video, Raindrop described Experiments as a way to “compare anything and measure how your agent’s behavior actually changed in production across millions of real interactions.”The platform helps users spot issues such as task failure spikes, forgetting, or new tools that trigger unexpected errors. It can also be used in reverse — starting from a known problem, such as an “agent stuck in a loop,” and tracing back to which model, tool, or flag is driving it. From there, developers can dive into detailed traces to find the root cause and ship a fix quickly.Each experiment provides a visual breakdown of metrics like tool usage frequency, error rates, conversation duration, and response length. Users can click on any comparison to access the underlying event data, giving them a clear view of how agent behavior changed over time. Shared links make it easy to collaborate with teammates or report findings.Integration, Scalability, and AccuracyAccording to Hylak, Experiments integrates directly with “the feature flag platforms companies know and love (like Statsig!)” and is designed to work seamlessly with existing telemetry and analytics pipelines. For companies without those integrations, it can still compare performance over time—such as yesterday versus today—without additional setup.Hylak said teams typically need around 2,000 users per day to produce statistically meaningful results. To ensure the accuracy of comparisons, Experiments monitors for sample size adequacy and alerts users if a test lacks enough data to draw valid conclusions.“We obsess over making sure metrics like Task Failure and User Frustration are metrics that you’d wake up an on-call engineer for,” Hylak explained. He added that teams can drill into the specific conversations or events that drive those metrics, ensuring transparency behind every aggregate number.Security and Data ProtectionRaindrop operates as a cloud-hosted platform but also offers on-premise personally identifiable information (PII) redaction for enterprises that need additional control. Hylak said the company is SOC 2 compliant and has launched a PII Guard feature that uses AI to automatically remove sensitive information from stored data. “We take protecting customer data very seriously,” he emphasized.Pricing and PlansExperiments is part of Raindrop’s Pro plan, which costs $350 per month or $0.0007 per interaction. The Pro tier also includes deep research tools, topic clustering, custom issue tracking, and semantic search capabilities.Raindrop’s Starter plan — $65 per month or $0.001 per interaction — offers core analytics including issue detection, user feedback signals, Slack alerts, and user tracking. Both plans come with a 14-day free trial.Larger organizations can opt for an Enterprise plan with custom pricing and advanced features like SSO login, custom alerts, integrations, edge-PII redaction, and priority support.Continuous Improvement for AI SystemsWith Experiments, Raindrop positions itself at the intersection of AI analytics and software observability. Its focus on “measure truth,” as stated in the product video, reflects a broader push within the industry toward accountability and transparency in AI operations.Rather than relying solely on offline benchmarks, Raindrop’s approach emphasizes real user data and contextual understanding. The company hopes this will allow AI developers to move faster, identify root causes sooner, and ship better-performing models with confidence.

« 1...3132333435...191»
×