Latest AI News & Updates

#ai #datadecisionmakers

AI tools are revolutionizing software development by automating repetitive tasks, refactoring bloated code, and identifying bugs in real-time. Developers can now generate well-structured code from plain language prompts, saving hours of manual effort. These tools learn from vast codebases, offering context-aware recommendations that enhance productivity and reduce errors. Rather than starting from scratch, engineers can prototype quickly, iterate faster and focus on solving increasingly complex problems.As code generation tools grow in popularity, they raise questions about the future size and structure of engineering teams. Earlier this year, Garry Tan, CEO of startup accelerator Y Combinator, noted that about one-quarter of its current clients use AI to write 95% or more of their software. In an interview with CNBC, Tan said: “What that means for founders is that you don’t need a team of 50 or 100 engineers, you don’t have to raise as much. The capital goes much longer.”AI-powered coding may offer a fast solution for businesses under budget pressure — but its long-term effects on the field and labor pool cannot be ignored.As AI-powered coding rises, human expertise may diminish
In the era of AI, the traditional journey to coding expertise that has long supported senior developers may be at risk. Easy access to large language models (LLMs) enables junior coders to quickly identify issues in code. While this speeds up software development, it can distance developers from their own work, delaying the growth of core problem-solving skills. As a result, they may avoid the focused, sometimes uncomfortable hours required to build expertise and progress on the path to becoming successful senior developers.Consider Anthropic’s Claude Code, a terminal-based assistant built on the Claude 3.7 Sonnet model, which automates bug detection and resolution, test creation and code refactoring. Using natural language commands, it reduces repetitive manual work and boosts productivity.Microsoft has also released two open-source frameworks — AutoGen and Semantic Kernel — to support the development of agentic AI systems. AutoGen enables asynchronous messaging, modular components, and distributed agent collaboration to build complex workflows with minimal human input. Semantic Kernel is an SDK that integrates LLMs with languages like C#, Python and Java, letting developers build AI agents to automate tasks and manage enterprise applications.The increasing availability of these tools from Anthropic, Microsoft and others may reduce opportunities for coders to refine and deepen their skills. Rather than “banging their heads against the wall” to debug a few lines or select a library to unlock new features, junior developers may simply turn to AI for an assist. This means senior coders with problem-solving skills honed over decades may become an endangered species.Overreliance on AI for writing code risks weakening developers’ hands-on experience and understanding of key programming concepts. Without regular practice, they may struggle to independently debug, optimize or design systems. Ultimately, this erosion of skill can undermine critical thinking, creativity and adaptability — qualities that are essential not just for coding, but for assessing the quality and logic of AI-generated solutions.AI as mentor: Turning code automation into hands-on learningWhile concerns about AI diminishing human developer skills are valid, businesses shouldn’t dismiss AI-supported coding. They just need to think carefully about when and how to deploy AI tools in development. These tools can be more than productivity boosters; they can act as interactive mentors, guiding coders in real time with explanations, alternatives and best practices.When used as a training tool, AI can reinforce learning by showing coders why code is broken and how to fix it—rather than simply applying a solution. For example, a junior developer using Claude Code might receive immediate feedback on inefficient syntax or logic errors, along with suggestions linked to detailed explanations. This enables active learning, not passive correction. It’s a win-win: Accelerating project timelines without doing all the work for junior coders.Additionally, coding frameworks can support experimentation by letting developers prototype agent workflows or integrate LLMs without needing expert-level knowledge upfront. By observing how AI builds and refines code, junior developers who actively engage with these tools can internalize patterns, architectural decisions and debugging strategies — mirroring the traditional learning process of trial and error, code reviews and mentorship.However, AI coding assistants shouldn’t replace real mentorship or pair programming. Pull requests and formal code reviews remain essential for guiding newer, less experienced team members. We are nowhere near the point at which AI can single-handedly upskill a junior developer.Companies and educators can build structured development programs around these tools that emphasize code comprehension to ensure AI is used as a training partner rather than a crutch. This encourages coders to question AI outputs and requires manual refactoring exercises. In this way, AI becomes less of a replacement for human ingenuity and more of a catalyst for accelerated, experiential learning.Bridging the gap between automation and educationWhen utilized with intention, AI doesn’t just write code; it teaches coding, blending automation with education to prepare developers for a future where deep understanding and adaptability remain indispensable.By embracing AI as a mentor, as a programming partner and as a team of developers we can direct to the problem at hand, we can bridge the gap between effective automation and education. We can empower developers to grow alongside the tools they use. We can ensure that, as AI evolves, so too does the human skill set, fostering a generation of coders who are both efficient and deeply knowledgeable.Richard Sonnenblick is chief data scientist at Planview.

#ai

How a semiconductor veteran turned over a century of horticultural wisdom into AI-led competitive advantage For decades, a ritual played out across ScottsMiracle-Gro’s media facilities. Every few weeks, workers walked acres of towering compost and wood chip piles with nothing more than measuring sticks. They wrapped rulers around each mound, estimated height, and did what company President Nate Baxter now describes as “sixth-grade geometry to figure out volume.”Today, drones glide over those same plants with mechanical precision. Vision systems calculate volumes in real time. The move from measuring sticks to artificial intelligence signals more than efficiency. It is the visible proof of one of corporate America’s most unlikely technology stories.The AI revolution finds an unexpected leaderEnterprise AI has been led by predictable players. Software companies with cloud-native architectures. Financial services firms with vast data lakes. Retailers with rich digital touchpoints. Consumer packaged goods companies that handle physical products like fertilizer and soil were not expected to lead.Yet ScottsMiracle-Gro has realized more than half of a targeted $150 million in supply chain savings. It reports a 90 percent improvement in customer service response times. Its predictive models enable weekly reallocation of marketing resources across regional markets.A Silicon Valley veteran bets on soil scienceBaxter’s path to ScottsMiracle-Gro (SMG) reads like a calculated pivot, not a corporate rescue. After two decades in semiconductor manufacturing at Intel and Tokyo Electron, he knew how to apply advanced technology to complex operations.“I sort of initially said, ‘Why would I do this? I’m running a tech company. It’s an industry I’ve been in for 25 years,’” Baxter recalls of his reaction when ScottsMiracle-Gro CEO Jim Hagedorn approached him in 2023. The company was reeling from a collapsed $1.2 billion hydroponics investment and facing what he describes as “pressure from a leverage standpoint.”His wife challenged him with a direct prompt. If you are not learning or putting yourself in uncomfortable situations, you should change that.Baxter saw clear parallels between semiconductor manufacturing and SMG’s operations. Both require precision, quality control, and the optimization of complex systems. He also saw untapped potential in SMG’s domain knowledge. One hundred fifty years of horticultural expertise, regulatory know-how, and customer insight had never been fully digitized.“It became apparent to me whether it was on the backend with data analytics, business process transformation, and obviously now with AI being front and center of the consumer experience, a lot of opportunities are there,” he explains.The declaration that changed everythingThe pivot began at an all-hands meeting. “I just said, you know, guys, we’re a tech company. You just don’t know it yet,” Baxter recalls. “There’s so much opportunity here to drive this company to where it needs to go.”The first challenge was organizational. SMG had evolved into functional silos. IT, supply chain, and brand teams ran independent systems with little coordination. Drawing on his experience with complex technology organizations, Baxter restructured the consumer business into three business units. General managers became accountable not just for financial results but also for technology implementation within their domains.“I came in and said, we’re going to create new business units,” he explains. “The buck stops with you and I’m holding you accountable not only for the business results, for the quality of the creative and marketing, but for the implementation of technology.”To support the new structure, SMG set up centers of excellence for digital capabilities, insights and analytics, and creative functions. The hybrid design placed centralized expertise behind distributed accountability.Mining corporate memory for AI goldTurning legacy knowledge into machine-ready intelligence required what Fausto Fleites, VP of Data Intelligence, calls “archaeological work.” The team excavated decades of business logic embedded in legacy SAP systems and converted filing cabinets of research into AI-ready datasets. Fleites, a Cuban immigrant with a doctorate from FIU who led Florida’s public hurricane loss model before roles at Sears and Cemex, understood the stakes.“The costly part of the migration was the business reporting layer we have in SAP Business Warehouse,” Fleites explains. “You need to uncover business logic created in many cases over decades.”SMG chose Databricks as its unified data platform. The team had Apache Spark expertise. Databricks offered strong SAP integration and aligned with a preference for open-source technologies that minimize vendor lock-in.The breakthrough came through systematic knowledge management. SMG built an AI bot using Google’s Gemini large language model to catalog and clean internal repositories. The system identified duplicates, grouped content by topic, and restructured information for AI consumption. The effort reduced knowledge articles by 30 percent while increasing their utility.“We used Gemini LLMs to actually categorize them into topics, find similar documents,” Fleites explains. A hybrid approach that combined modern AI with techniques like cosine similarity became the foundation for later applications.Building AI systems that actually understand fertilizerEarly trials with off-the-shelf AI exposed a real risk. General-purpose models confused products designed for killing weeds with those for preventing them. That mistake can ruin a lawn.“Different products, if you use one in the wrong place, would actually have a very negative outcome,” Fleites notes. “But those are kind of synonyms in certain contexts to the LLM. So they were recommending the wrong products.”The solution was a new architecture. SMG created what Fleites calls a “hierarchy of agents.” A supervisor agent routes queries to specialized worker agents organized by brand. Each agent draws on deep product knowledge encoded from a 400-page internal training manual.The system also changes the conversation. When users ask for recommendations, the agents start with questions about location, goals, and lawn conditions. They narrow possibilities step by step before offering suggestions. The stack integrates with APIs for product availability and state-specific regulatory compliance.From drones to demand forecasting across the enterpriseThe transformation runs across the company. Drones measure inventory piles. Demand forecasting models analyze more than 60 factors, including weather patterns, consumer sentiment, and macroeconomic indicators.These predictions enable faster moves. When drought struck Texas, the models supported a shift in promotional spending to regions with favorable weather. The reallocation helped drive positive quarterly results.“We not only have the ability to move marketing and promotion dollars around, but we’ve even gotten to the point where if it’s going to be a big weekend in the Northeast, we’ll shift our field sales resources from other regions up there,” Baxter explains.Consumer Services changed as well. AI agents now process incoming emails through Salesforce, draft responses based on the knowledge base, and flag them for brief human review. Draft times dropped from ten minutes to seconds and response quality improved.The company emphasizes explainable AI. Using SHAP, SMG built dashboards that decompose each forecast and show how weather, promotions, or media spending contribute to predictions.“Typically, if you open a prediction to a business person and you don’t say why, they’ll say, ‘I don’t believe you,’” Fleites explains. Transparency made it possible to move resource allocation from quarterly to weekly cycles.Competing like a startupSMG’s results challenge assumptions about AI readiness in traditional industries. The advantage does not come from owning the most sophisticated models. It comes from combining general-purpose AI with unique, structured domain knowledge.“LLMs are going to be a commodity,” Fleites observes. “The strategic differentiator is what is the additional level of [internal] knowledge we can fit to them.”Partnerships are central. SMG works with Google Vertex AI for foundational models, Sierra.ai for production-ready conversational agents, and Kindwise for computer vision. The ecosystem approach lets a small internal team recruited from Meta, Google, and AI startups deliver outsized impact without building everything from scratch.Talent follows impact. Conventional wisdom says traditional companies cannot compete with Meta salaries or Google stock. SMG offered something different. It offered the chance to build transformative AI applications with immediate business impact.“When we have these interviews, what we propose to them is basically the ability to have real value with the latest knowledge in these spaces,” Fleites explains. “A lot of people feel motivated to come to us” because much of big tech AI work, despite the hype, “doesn’t really have an impact.”Team design mirrors that philosophy. “My direct reports are leaders and not only manage people, but are technically savvy,” Fleites notes. “We always are constantly switching hands between developing or maintaining a solution versus strategy versus managing people.” He still writes code weekly. The small team of 15 to 20 AI and engineering professionals stays lean by contracting out implementation while keeping “the know-how and the direction and the architecture” in-house.When innovation meets immovable objectsNot every pilot succeeded. SMG tested semi-autonomous forklifts in a 1.3 million square foot distribution facility. Remote drivers in the Philippines controlled up to five vehicles at once with strong safety records.“The technology was actually really great,” Baxter acknowledges. The vehicles could not lift enough weight for SMG’s heavy products. The company paused implementation.“Not everything we’ve tried has gone smoothly,” Baxter admits. “But I think another important point is you have to focus on a few critical ones and you have to know when something isn’t going to work and readjust.”The lesson tracks with semiconductor discipline. Investments must show measurable returns within set timeframes. Regulatory complexity adds difficulty. Products must comply with EPA rules and a patchwork of state restrictions, which AI systems must navigate correctly.The gardening sommelier and agent-to-agent futuresThe roadmap reflects a long-term view. SMG plans a “gardening sommelier” mobile app in 2026 that identifies plants, weeds, and lawn problems from photos and provides instant guidance. A beta already helps field sales teams answer complex product questions by querying the 400-page knowledge base.The company is exploring agent-to-agent communication so its specialized AI can interface with retail partners’ systems. A customer who asks a Walmart chatbot for lawn advice could trigger an SMG query that returns accurate, regulation-compliant recommendations.SMG has launched AI-powered search on its website, replacing keyword systems with conversational engines based on the internal stack. The future vision pairs predictive models with conversational agents so the system can reach out when conditions suggest a customer may need help.What traditional industries can learnScottsMiracle-Gro's transformation offers a clear playbook for enterprises. The advantage doesn't come from deploying the most sophisticated models. Instead, it comes from combining AI with proprietary domain knowledge that competitors can't easily replicate.By making general managers responsible for both business results and technology implementation, SMG ensured AI wasn't just an IT initiative but a business imperative. The 150 years of horticultural expertise only became valuable when it was digitized, structured, and made accessible to AI systems.Legacy companies competing for AI engineers can't match Silicon Valley compensation packages. But they can offer something tech giants often can't: immediate, measurable impact. When engineers see their weather forecasting models directly influence quarterly results or their agent architecture prevent customers from ruining their lawns, the work carries weight that another incremental improvement to an ad algorithm never will.“We have a right to win,” Baxter says. “We have 150 years of this experience.” That experience is now data, and data is the company’s competitive edge. ScottsMiracle-Gro didn’t outspend its rivals or chase the newest AI model. It turned knowledge into an operating system for growth. For a company built on soil, its biggest breakthrough might be cultivating data.

Never forget again, voice typing, Oxford AI guide, 40 jobs at risk, and more...

See the results of comparing speed and memory efficiency of DuckDB, SQLite, and Pandas on a million-row dataset.

#agentic ai #artificial intelligence #data engineering #data science #science and technology

What's happening—and what's next— for data and AI at the close of 2025.
The post 10 Data + AI Observations for Fall 2025 appeared first on Towards Data Science.

#artificial intelligence #deep dives #deep learning

Explaining "MineWorld: A real-time and open-source interactive world model on Minecraft" in simple terms.
The post Dreaming in Blocks — MineWorld, the Minecraft World Model appeared first on Towards Data Science.

#ai

Enterprises expanding AI deployments are hitting an invisible performance wall. The culprit? Static speculators that can't keep up with shifting workloads.Speculators are smaller AI models that work alongside large language models during inference. They draft multiple tokens ahead, which the main model then verifies in parallel. This technique (called speculative decoding) has become essential for enterprises trying to reduce inference costs and latency. Instead of generating tokens one at a time, the system can accept multiple tokens at once, dramatically improving throughput.Together AI today announced research and a new system called ATLAS (AdapTive-LeArning Speculator System) that aims to help enterprises overcome the challenge of static speculators. The technique provides a self-learning inference optimization capability that can help to deliver up to 400% faster inference performance than a baseline level of performance available in existing inference technologies such as vLLM.. The system addresses a critical problem: as AI workloads evolve, inference speeds degrade, even with specialized speculators in place.The company which got its start in 2023, has been focused on optimizing inference on its enterprise AI platform. Earlier this year the company raised $305 million as customer adoption and demand has grown."Companies we work with generally, as they scale up, they see shifting workloads, and then they don't see as much speedup from speculative execution as before," Tri Dao, chief scientist at Together AI, told VentureBeat in an exclusive interview. "These speculators generally don't work well when their workload domain starts to shift."The workload drift problem no one talks aboutMost speculators in production today are "static" models. They're trained once on a fixed dataset representing expected workloads, then deployed without any ability to adapt. Companies like Meta and Mistral ship pre-trained speculators alongside their main models. Inference platforms like vLLM use these static speculators to boost throughput without changing output quality.But there's a catch. When an enterprise's AI usage evolves the static speculator's accuracy plummets."If you're a company producing coding agents, and most of your developers have been writing in Python, all of a sudden some of them switch to writing Rust or C, then you see the speed starts to go down," Dao explained. "The speculator has a mismatch between what it was trained on versus what the actual workload is."This workload drift represents a hidden tax on scaling AI. Enterprises either accept degraded performance or invest in retraining custom speculators. That process captures only a snapshot in time and quickly becomes outdated.How adaptive speculators work: A dual-model approachATLAS uses a dual-speculator architecture that combines stability with adaptation:The static speculator - A heavyweight model trained on broad data provides consistent baseline performance. It serves as a "speed floor."The adaptive speculator - A lightweight model learns continuously from live traffic. It specializes on-the-fly to emerging domains and usage patterns.The confidence-aware controller - An orchestration layer dynamically chooses which speculator to use. It adjusts the speculation "lookahead" based on confidence scores."Before the adaptive speculator learns anything, we still have the static speculator to help provide the speed boost in the beginning," Ben Athiwaratkun, staff AI scientist at Together AI explained to VentureBeat. "Once the adaptive speculator becomes more confident, then the speed grows over time."The technical innovation lies in balancing acceptance rate (how often the target model agrees with drafted tokens) and draft latency. As the adaptive model learns from traffic patterns, the controller relies more on the lightweight speculator and extends lookahead. This compounds performance gains.Users don't need to tune any parameters. "On the user side, users don't have to turn any knobs," Dao said. "On our side, we have turned these knobs for users to adjust in a configuration that gets good speedup."Performance that rivals custom siliconTogether AI's testing shows ATLAS reaching 500 tokens per second on DeepSeek-V3.1 when fully adapted. More impressively, those numbers on Nvidia B200 GPUs match or exceed specialized inference chips like Groq's custom hardware."The software and algorithmic improvement is able to close the gap with really specialized hardware," Dao said. "We were seeing 500 tokens per second on these huge models that are even faster than some of the customized chips."The 400% speedup that the company claims for inference represents the cumulative effect of Together's Turbo optimization suite. FP4 quantization delivers 80% speedup over FP8 baseline. The static Turbo Speculator adds another 80-100% gain. The adaptive system layers on top. Each optimization compounds the benefits of the others.Compared to standard inference engines like vLLM or Nvidia's TensorRT-LLM, the improvement is substantial. Together AI benchmarks against the stronger baseline between the two for each workload before applying speculative optimizations.The memory-compute tradeoff explainedThe performance gains stem from exploiting a fundamental inefficiency in modern inference: wasted compute capacity.Dao explained that typically during inference, much of the compute power is not fully utilized."During inference, which is actually the dominant workload nowadays, you're mostly using the memory subsystem," he said.Speculative decoding trades idle compute for reduced memory access. When a model generates one token at a time, it's memory-bound. The GPU sits idle while waiting for memory. But when the speculator proposes five tokens and the target model verifies them simultaneously, compute utilization spikes while memory access remains roughly constant."The total amount of compute to generate five tokens is the same, but you only had to access memory once, instead of five times," Dao said.Think of it as intelligent caching for AIFor infrastructure teams familiar with traditional database optimization, adaptive speculators function like an intelligent caching layer, but with a crucial difference.Traditional caching systems like Redis or memcached require exact matches. You store the exact same query result and retrieve it when that specific query runs again. Adaptive speculators work differently."You can view it as an intelligent way of caching, not storing exactly, but figuring out some patterns that you see," Dao explained. "Broadly, we're observing that you're working with similar code, or working with similar, you know, controlling compute in a similar way. We can then predict what the big model is going to say. We just get better and better at predicting that."Rather than storing exact responses, the system learns patterns in how the model generates tokens. It recognizes that if you're editing Python files in a specific codebase, certain token sequences become more likely. The speculator adapts to those patterns, improving its predictions over time without requiring identical inputs.Use cases: RL training and evolving workloadsTwo enterprise scenarios particularly benefit from adaptive speculators:Reinforcement learning training: Static speculators quickly fall out of alignment as the policy evolves during training. ATLAS adapts continuously to the shifting policy distribution.Evolving workloads: As enterprises discover new AI use cases, workload composition shifts. "Maybe they started using AI for chatbots, but then they realized, hey, it can write code, so they start shifting to code," Dao said. "Or they realize these AIs can actually call tools and control computers and do accounting and things like that."In a vibe-coding session, the adaptive system can specialize for the specific codebase being edited. These are files not seen during training. This further increases acceptance rates and decoding speed.What it means for enterprises and the inference ecosystemATLAS is available now on Together AI's dedicated endpoints as part of the platform at no additional cost. The company's 800,000-plus developers (up from 450,000 in February) have access to the optimization.But the broader implications extend beyond one vendor's product. The shift from static to adaptive optimization represents a fundamental rethinking of how inference platforms should work. As enterprises deploy AI across multiple domains, the industry will need to move beyond one-time trained models toward systems that learn and improve continuously.Together AI has historically released some of its research techniques as open source and collaborated with projects like vLLM. While the fully integrated ATLAS system is proprietary, some of the underlying techniques may eventually influence the broader inference ecosystem. For enterprises looking to lead in AI, the message is clear: adaptive algorithms on commodity hardware can match custom silicon at a fraction of the cost. As this approach matures across the industry, software optimization increasingly trumps specialized hardware.

#ai & ml #events

A common misconception about O’Reilly is that we cater only to the deeply technical learner. While we’re proud of our deep roots in the tech community, the breadth of our offerings, both in books and on our learning platform, has always aimed to reach a broader audience of tech-adjacent and tech-curious people who want to […]

Agentic artificial intelligence (AI) represents the most significant shift in machine learning since deep learning transformed the field.

#business #business / artificial intelligence

Mark Zuckerberg's metaverse chief is urging employees to adopt AI across every workflow as part of a broader shift inside the company.

#ai

It seems like almost every week for the last two years since ChatGPT launched, new large language models (LLMs) from rival labs or from OpenAI itself have been released. Enterprises are hard pressed to keep up with the massive pace of change, let alone understand how to adapt to it — which of these new models should they adopt, if any, to power their workflows and the custom AI agents they're building to carry them out? Help has arrived: AI applications observability startup Raindrop has launched Experiments, a new analytics feature that the company describes as the first A/B testing suite designed specifically for enterprise AI agents — allowing companies to see and compare how updating agents to new underlying models, or changing their instructions and tool access, will impact their performance with real end users. The release extends Raindrop’s existing observability tools, giving developers and teams a way to see how their agents behave and evolve in real-world conditions.With Experiments, teams can track how changes — such as a new tool, prompt, model update, or full pipeline refactor — affect AI performance across millions of user interactions. The new feature is available now for users on Raindrop’s Pro subscription plan ($350 monthly) at raindrop.ai. A Data-Driven Lens on Agent DevelopmentRaindrop co-founder and chief technology officer Ben Hylak noted in a product announcement video (above) that Experiments helps teams see “how literally anything changed,” including tool usage, user intents, and issue rates, and to explore differences by demographic factors such as language. The goal is to make model iteration more transparent and measurable.The Experiments interface presents results visually, showing when an experiment performs better or worse than its baseline. Increases in negative signals might indicate higher task failure or partial code output, while improvements in positive signals could reflect more complete responses or better user experiences.By making this data easy to interpret, Raindrop encourages AI teams to approach agent iteration with the same rigor as modern software deployment—tracking outcomes, sharing insights, and addressing regressions before they compound.Background: From AI Observability to ExperimentationRaindrop’s launch of Experiments builds on the company’s foundation as one of the first AI-native observability platforms, designed to help enterprises monitor and understand how their generative AI systems behave in production. As VentureBeat reported earlier this year, the company — originally known as Dawn AI — emerged to address what Hylak, a former Apple human interface designer, called the “black box problem” of AI performance, helping teams catch failures “as they happen and explain to enterprises what went wrong and why." At the time, Hylak described how “AI products fail constantly—in ways both hilarious and terrifying,” noting that unlike traditional software, which throws clear exceptions, “AI products fail silently.” Raindrop’s original platform focused on detecting those silent failures by analyzing signals such as user feedback, task failures, refusals, and other conversational anomalies across millions of daily events.The company’s co-founders— Hylak, Alexis Gauba, and Zubin Singh Koticha — built Raindrop after encountering firsthand the difficulty of debugging AI systems in production. “We started by building AI products, not infrastructure,” Hylak told VentureBeat. “But pretty quickly, we saw that to grow anything serious, we needed tooling to understand AI behavior—and that tooling didn’t exist.”With Experiments, Raindrop extends that same mission from detecting failures to measuring improvements. The new tool transforms observability data into actionable comparisons, letting enterprises test whether changes to their models, prompts, or pipelines actually make their AI agents better—or just different.Solving the “Evals Pass, Agents Fail” ProblemTraditional evaluation frameworks, while useful for benchmarking, rarely capture the unpredictable behavior of AI agents operating in dynamic environments. As Raindrop co-founder Alexis Gauba explained in her LinkedIn announcement, “Traditional evals don’t really answer this question. They’re great unit tests, but you can’t predict your user’s actions and your agent is running for hours, calling hundreds of tools.”Gauba said the company consistently heard a common frustration from teams: “Evals pass, agents fail.”Experiments is meant to close that gap by showing what actually changes when developers ship updates to their systems. The tool enables side-by-side comparisons of models, tools, intents, or properties, surfacing measurable differences in behavior and performance.Designed for Real-World AI BehaviorIn the announcement video, Raindrop described Experiments as a way to “compare anything and measure how your agent’s behavior actually changed in production across millions of real interactions.”The platform helps users spot issues such as task failure spikes, forgetting, or new tools that trigger unexpected errors. It can also be used in reverse — starting from a known problem, such as an “agent stuck in a loop,” and tracing back to which model, tool, or flag is driving it. From there, developers can dive into detailed traces to find the root cause and ship a fix quickly.Each experiment provides a visual breakdown of metrics like tool usage frequency, error rates, conversation duration, and response length. Users can click on any comparison to access the underlying event data, giving them a clear view of how agent behavior changed over time. Shared links make it easy to collaborate with teammates or report findings.Integration, Scalability, and AccuracyAccording to Hylak, Experiments integrates directly with “the feature flag platforms companies know and love (like Statsig!)” and is designed to work seamlessly with existing telemetry and analytics pipelines. For companies without those integrations, it can still compare performance over time—such as yesterday versus today—without additional setup.Hylak said teams typically need around 2,000 users per day to produce statistically meaningful results. To ensure the accuracy of comparisons, Experiments monitors for sample size adequacy and alerts users if a test lacks enough data to draw valid conclusions.“We obsess over making sure metrics like Task Failure and User Frustration are metrics that you’d wake up an on-call engineer for,” Hylak explained. He added that teams can drill into the specific conversations or events that drive those metrics, ensuring transparency behind every aggregate number.Security and Data ProtectionRaindrop operates as a cloud-hosted platform but also offers on-premise personally identifiable information (PII) redaction for enterprises that need additional control. Hylak said the company is SOC 2 compliant and has launched a PII Guard feature that uses AI to automatically remove sensitive information from stored data. “We take protecting customer data very seriously,” he emphasized.Pricing and PlansExperiments is part of Raindrop’s Pro plan, which costs $350 per month or $0.0007 per interaction. The Pro tier also includes deep research tools, topic clustering, custom issue tracking, and semantic search capabilities.Raindrop’s Starter plan — $65 per month or $0.001 per interaction — offers core analytics including issue detection, user feedback signals, Slack alerts, and user tracking. Both plans come with a 14-day free trial.Larger organizations can opt for an Enterprise plan with custom pricing and advanced features like SSO login, custom alerts, integrations, edge-PII redaction, and priority support.Continuous Improvement for AI SystemsWith Experiments, Raindrop positions itself at the intersection of AI analytics and software observability. Its focus on “measure truth,” as stated in the product video, reflects a broader push within the industry toward accountability and transparency in AI operations.Rather than relying solely on offline benchmarks, Raindrop’s approach emphasizes real user data and contextual understanding. The company hopes this will allow AI developers to move faster, identify root causes sooner, and ship better-performing models with confidence.

AI Tools of the Month, AI gets rich and famous, a Nano Banana camera, and more...

#special events and guest speakers #alumni/ae #artificial intelligence #technology and society #school of humanities arts and social sciences #mit corporation

Receiving the Robert A. Muh award, the technologist and author heralded a bright future for AI, breakthroughs in longevity, and more.

#ai

Researchers at Nvidia have developed a new technique that flips the script on how large language models (LLMs) learn to reason. The method, called reinforcement learning pre-training (RLP), integrates RL into the initial training phase rather than saving it for the end.This approach encourages the model to “think for itself before predicting what comes next, thus teaching an independent thinking behavior earlier in the pretraining,” the researchers state in their paper. By learning to reason on plain text without needing external verifiers, models trained with RLP show significant improvements in learning complex reasoning tasks downstream, hinting at a future of more capable and adaptable AI for real-world tasks.The typical LLM training cycleTypically, large language models are first pre-trained on vast amounts of text using a "next-token prediction" objective, where they are given a string of text and asked to continuously guess what the next word (or token) will be. In this phase, they learn grammar, facts, and basic associations.In the later post-training phase, models usually learn complex reasoning abilities such as chain-of-thought (CoT) where a model lays out its reasoning step-by-step. This stage often involves supervised fine-tuning (SFT) or reinforcement learning from human feedback (RLHF), which require specialized, curated datasets.The paper’s authors argue this sequential process does not match human comprehension, which is “not a linear token-by-token process, but rather a parallel integration of input with prior knowledge.” Existing pre-training methods lack this mechanism, hindering a model's ability to develop deep reasoning from the start.How reinforcement learning pre-training worksRLP reframes this process by treating CoT generation as an action the model takes before predicting the next token. At each step, the model first generates an internal "thought" or reasoning chain. It then predicts the next word in the text, using the original context augmented with its new thought.The model receives a reward based on how much its thought improved the accuracy of its prediction compared to a baseline that didn't generate a thought (pure next-token prediction). This reward signal is calculated automatically based on the change in probability, eliminating the need for external verifiers or human-labeled data. The reward is positive only when the generated thought helps the model better predict the next token. By rewarding thoughts based on their predictive benefit, RLP effectively teaches the model how to think usefully on the same massive, unstructured datasets used for standard pre-training. This continuous feedback loop allows the model to learn when a simple predictive guess is sufficient and when it needs to engage in deeper reasoning. As the researchers put it, “RLP is designed to shape thinking in base models by rewarding only those thoughts that measurably help next-token prediction.”This foundational approach, however, doesn't make later fine-tuning stages obsolete. According to Bryan Catanzaro, VP of applied deep learning research at Nvidia and a co-author of the paper, RLP is designed to complement, not replace, these crucial steps. "RLP isn’t meant to replace the later post-training stages like supervised fine-tuning or reinforcement learning from human feedback," Catanzaro told VentureBeat. "Those stages remain crucial for refining model behavior... It’s really designed to amplify the effectiveness of those later phases by giving the model a head start."RLP in actionIn experiments with Qwen3-1.7B and Nemotron-Nano-12B, Nvidia’s team tested RLP across a suite of math and science reasoning benchmarks. The results show that models enhanced with RLP consistently outperformed their conventionally trained counterparts, with particularly strong gains in reasoning-heavy tasks. For an enterprise, this improved reasoning could translate to more reliable outputs in multi-step workflows like financial analysis or legal document summarization."RLP encourages the model during pretraining to think before it predicts, helping the model internalize a more coherent reasoning style," said Catanzaro. "This could help reduce subtle logical errors, especially in longer workflows.” While stressing that RLP-trained models will still need the usual guardrails such as verification layers, human oversight, and consistency checks, Catanzaro said that “RLP gives you a stronger baseline."Importantly, the benefits of RLP compound instead of disappearing during subsequent fine-tuning stages (catastrophic forgetting is a common problem in LLM training, where later training stages cause the model to forget its previously learned skills and knowledge). The RLP-trained model achieved an overall score that was 7-8% higher than baselines after an identical post-training regimen. The researchers conclude that RLP “establishes robust reasoning foundations that are not washed out by downstream alignment but instead compound with post-training.”The efficiency of the technique is a key finding. On the Qwen3-1.7B model, RLP improved performance by 17% over standard continuous pre-training and also beat a similar technique called Reinforcement Pretraining via prefix-matching rewards (RPT). This advantage held even when the baseline model was trained with 35 times more data to match the computational cost, confirming the gains come from the method itself, not just more processing.Furthermore, RLP demonstrates impressive scalability and versatility, successfully extracting a reasoning signal from general-purpose web data—not just curated datasets. When applied to the hybrid Mamba-Transformer model Nemotron-Nano-12B, RLP achieved a 35% relative improvement over a heavily trained baseline while using just a tiny fraction of the data.While these results point toward a more efficient path for building powerful models, Catanzaro frames the innovation as a fundamental shift in the learning process itself, rather than an immediate solution to high training costs. "This research is exciting because it offers a shift in how models absorb information during pretraining leading to a smarter learning process," he explained. "It wouldn’t replace large-scale pretraining, but offer another creative method in building the best possible models."A new foundation for AI trainingUltimately, RLP points toward a future where pre-training is no longer a monolithic process of next-token prediction. Instead, the next generation of models could be built on a hybrid of objectives, creating AI that learns to think more robustly from day one. Catanzaro offers a powerful analogy to frame this shift:"Next-token prediction teaches a model what the world looks like; reinforcement-style objectives like RLP can teach it how to think about what it’s seeing," he said. "The combination of these two objectives could help models develop deeper, more structured thinking much earlier in training... Tools like RLP can build on top of that foundation, making learning more active, curious, and even more efficient."There is still a lot to learn about the dynamics of reinforcement learning in the pre-training phase, but what seems clear is that “introducing exploration earlier in training opens a new axis for scaling — not just in size, but in how models learn to reason,” Catanzaro said.

#advanced (300) #amazon machine learning #amazon sagemaker autopilot #amazon sagemaker hyperpod #artificial intelligence #expert (400) #generative ai #technical how-to

In this post, we demonstrate how to integrate Amazon SageMaker HyperPod with Anyscale platform to address critical infrastructure challenges in building and deploying large-scale AI models. The combined solution provides robust infrastructure for distributed AI workloads with high-performance hardware, continuous monitoring, and seamless integration with Ray, the leading AI compute engine, enabling organizations to reduce time-to-market and lower total cost of ownership.

#amazon nova #amazon sagemaker ai #technical how-to

In this post, we introduce Amazon Nova customization for text content moderation through Amazon SageMaker AI, enabling organizations to fine-tune models for their specific moderation needs. The evaluation across three benchmarks shows that customized Nova models achieve an average improvement of 7.3% in F1 scores compared to the baseline Nova Lite, with individual improvements ranging from 4.2% to 9.2% across different content moderation tasks.

#google cloud #ai

A snapshot of how top companies, governments, researchers and startups are enhancing their work with Google's AI solutions.

#ai #software #enterprise

Echelon, an artificial intelligence startup that automates enterprise software implementations, emerged from stealth mode today with $4.75 million in seed funding led by Bain Capital Ventures, targeting a fundamental shift in how companies deploy and maintain critical business systems.The San Francisco-based company has developed AI agents specifically trained to handle end-to-end ServiceNow implementations — complex enterprise software deployments that traditionally require months of work by offshore consulting teams and cost companies millions of dollars annually."The biggest barrier to digital transformation isn't technology — it's the time it takes to implement it," said Rahul Kayala, Echelon's founder and CEO, who previously worked at AI-powered IT company Moveworks. "AI agents are eliminating that constraint entirely, allowing enterprises to experiment, iterate, and deploy platform changes with unprecedented speed."The announcement signals a potential disruption to the $1.5 trillion global IT services market, where companies like Accenture, Deloitte, and Capgemini have long dominated through labor-intensive consulting models that Echelon argues are becoming obsolete in the age of artificial intelligence.Why ServiceNow deployments take months and cost millionsServiceNow, a cloud-based platform used by enterprises to manage IT services, human resources, and business workflows, has become critical infrastructure for large organizations. However, implementing and customizing the platform typically requires specialized expertise that most companies lack internally.The complexity stems from ServiceNow's vast customization capabilities. Organizations often need hundreds of "catalog items" — digital forms and workflows for employee requests — each requiring specific configurations, approval processes, and integrations with existing systems. According to Echelon's research, these implementations frequently stretch far beyond planned timelines due to technical complexity and communication bottlenecks between business stakeholders and development teams."What starts out simple often turns into weeks of effort once the actual work begins," the company noted in its analysis of common implementation challenges. "A basic request form turns out to be five requests stuffed into one. We had catalog items with 50+ variables, 10 or more UI policies, all connected. Update one field, and something else would break."The traditional solution involves hiring offshore development teams or expensive consultants, creating what Echelon describes as a problematic cycle: "One question here, one delay there, and suddenly you're weeks behind."How AI agents replace expensive offshore consulting teamsEchelon's approach replaces human consultants with AI agents trained by elite ServiceNow experts from top consulting firms. These agents can analyze business requirements, ask clarifying questions in real-time, and automatically generate complete ServiceNow configurations including forms, workflows, testing scenarios, and documentation.The technology delivers a significant advancement from general-purpose AI tools. Rather than providing generic code suggestions, Echelon's agents understand ServiceNow's specific architecture, best practices, and common integration patterns. They can identify gaps in requirements and propose solutions that align with enterprise governance standards."Instead of routing every piece of input through five people, the business process owner directly uploaded their requirements," Kayala explained, describing a recent customer implementation. "The AI developer analyzes it and asks follow-up questions like: 'I see a process flow with 3 branches, but only 2 triggers. Should there be a 3rd?' The kinds of things a seasoned developer would ask. With AI, these questions came instantly."Early customers report dramatic time savings. One financial services company saw a service catalog migration project that was projected to take six months completed in six weeks using Echelon's AI agents.What makes Echelon's AI different from coding assistantsEchelon's technology addresses several technical challenges that have prevented broader AI adoption in enterprise software implementation. The agents are trained not just on ServiceNow's technical capabilities but on the accumulated expertise of senior consultants who understand complex enterprise requirements, governance frameworks, and integration patterns.This approach differs from general-purpose AI coding assistants like GitHub Copilot, which provide syntax suggestions but lack domain-specific expertise. Echelon's agents understand ServiceNow's data models, security frameworks, and upgrade considerations—knowledge typically acquired through years of consulting experience.The company's training methodology involves elite ServiceNow experts from consulting firms like Accenture and specialized ServiceNow partner Thirdera. This embedded expertise enables the AI to handle complex requirements and edge cases that typically require senior consultant intervention.The real challenge isn't teaching AI to write code — it's capturing the intuitive expertise that separates junior developers from seasoned architects. Senior ServiceNow consultants instinctively know which customizations will break during upgrades and how simple requests spiral into complex integration problems. This institutional knowledge creates a far more defensible moat than general-purpose coding assistants can offer.The $1.5 trillion consulting market faces disruptionEchelon's emergence reflects broader trends reshaping the enterprise software market. As companies accelerate digital transformation initiatives, the traditional consulting model increasingly appears inadequate for the speed and scale required.ServiceNow itself has grown rapidly, reporting over $10.98 billion in annual revenue in 2024, and $12.06 billion for the trailing twelve months ending June 30, 2025, as organizations continue to digitize more business processes. However, this growth has created a persistent talent shortage, with demand for skilled ServiceNow professionals — particularly those with AI expertise — significantly outpacing supply.The startup's approach could fundamentally alter the economics of enterprise software implementation. Traditional consulting engagements often involve large teams working for months, with costs scaling linearly with project complexity. AI agents, by contrast, can handle multiple projects simultaneously and apply learned knowledge across customers.Rak Garg, the Bain Capital Ventures partner who led Echelon's funding round, sees this as part of a larger shift toward AI-powered professional services. "We see the same trend with other BCV companies like Prophet Security, which automates security operations, and Crosby, which automates legal services for startups. AI is quickly becoming the delivery layer across multiple functions."Scaling beyond ServiceNow while maintaining enterprise reliabilityDespite early success, Echelon faces significant challenges in scaling its approach. Enterprise customers prioritize reliability above speed, and any AI-generated configurations must meet strict security and compliance requirements."Inertia is the biggest risk," Garg acknowledged. "IT systems shouldn't ever go down, and companies lose thousands of man-hours of productivity with every outage. Proving reliability at scale, and building on repeatable results will be critical for Echelon."The company plans to expand beyond ServiceNow to other enterprise platforms including SAP, Salesforce, and Workday — each creating substantial additional market opportunities. However, each platform requires developing new domain expertise and training models on platform-specific best practices.Echelon also faces potential competition from established consulting firms that are developing their own AI capabilities. However, Garg views these firms as potential partners rather than competitors, noting that many have already approached Echelon about collaboration opportunities."They know that AI is shifting their business model in real-time," he said. "Customers are placing immense pricing pressure on larger firms and asking hard questions, and these firms can use Echelon agents to accelerate their projects."How AI agents could reshape all professional servicesEchelon's funding and emergence from stealth marks a significant milestone in the application of AI to professional services. Unlike consumer AI applications that primarily enhance individual productivity, enterprise AI agents like Echelon's directly replace skilled labor at scale.The company's approach — training AI systems on expert knowledge rather than just technical documentation — could serve as a model for automating other complex professional services. Legal research, financial analysis, and technical consulting all involve similar patterns of applying specialized expertise to unique customer requirements.For enterprise customers, the promise extends beyond cost savings to strategic agility. Organizations that can rapidly implement and modify business processes gain competitive advantages in markets where customer expectations and regulatory requirements change frequently.As Kayala noted, "This unlocks a completely different approach to business agility and competitive advantage."The implications extend far beyond ServiceNow implementations. If AI agents can master the intricacies of enterprise software deployment—one of the most complex and relationship-dependent areas of professional services — few knowledge work domains may remain immune to automation.The question isn't whether AI will transform professional services, but how quickly human expertise can be converted into autonomous digital workers that never sleep, never leave for competitors, and get smarter with every project they complete.

#data science #analytics #artificial intelligence #data engineering #editors pick

The future of reporting will be about encoding the value proposition of a product into prompt design.
The post Past is Prologue: How Conversational Analytics Is Changing Data Work appeared first on Towards Data Science.

#artificial intelligence #data science #editors pick #foundation models #machine learning #tabular data

A turning point for data analysis?
The post How the Rise of Tabular Foundation Models Is Reshaping Data Science appeared first on Towards Data Science.

These 7 prompt templates will make LLMs your most useful assistant.

#gemini features #google cloud #ai

True business transformation in the era of AI must go beyond simple chatbots. That’s what Gemini Enterprise does.

#gemini features #google cloud #ai

Here are four ways Gemini Enterprise can help you and your team get time back in your day.

#ai & ml #commentary

This article originally appeared on Medium. Tim O’Brien has given us permission to repost here on Radar. When you’re working with AI tools like Cursor or GitHub Copilot, the real power isn’t just having access to different models—it’s knowing when to use them. Some jobs are OK with Auto. Others need a stronger model. And […]

Large language models (LLMs) are widely used in applications like chatbots, customer support, code assistants, and more.

#culture #culture / digital culture

The online trend takes a comedic approach to spreading anti-AI messaging, but some creators are using racist references to make their point.

#ai #dev #programming & development #automation #data infrastructure #enterprise

OpenAI’s annual developer conference on Monday was a spectacle of ambitious AI product launches, from an app store for ChatGPT to a stunning video-generation API that brought creative concepts to life. But for the enterprises and technical leaders watching closely, the most consequential announcement was the quiet general availability of Codex, the company's AI software engineer. This release signals a profound shift in how software—and by extension, modern business—is built.While other announcements captured the public’s imagination, the production-ready release of Codex, supercharged by a new specialized model and a suite of enterprise-grade tools, is the engine behind OpenAI’s entire vision. It is the tool that builds the tools, the proven agent in a world buzzing with agentic potential, and the clearest articulation of the company's strategy to win the enterprise.The general availability of Codex moves it from a "research preview" to a fully supported product, complete with a new software development kit (SDK), a Slack integration, and administrative controls for security and monitoring.This transition declares that Codex is ready for mission-critical work inside the world’s largest companies."We think this is the best time in history to be a builder; it has never been faster to go from idea to product," said OpenAI CEO Sam Altman during the opening keynote presentation. "Software used to take months or years to build. You saw that it can take minutes now to build with AI." That acceleration is not theoretical. It's a reality born from OpenAI’s own internal use — a massive "dogfooding" effort that serves as the ultimate case study for enterprise customers.Inside GPT-5-Codex: The AI model that codes autonomously for hours and drives 70% productivity gainsAt the heart of the Codex upgrade is GPT-5-Codex, a version of OpenAI's latest flagship model that has been "purposely trained for Codex and agentic coding." The new model is designed to function as an autonomous teammate, moving far beyond simple code autocompletion."I personally like to think about it as a little bit like a human teammate," explained Tibo Sottiaux, an OpenAI engineer, during a technical session on Codex. "You can pair a program with it on your computer, you can delegate to it, or as you'll see, you can give it a job without explicit prompting."This new model enables "adaptive thinking," allowing it to dynamically adjust the time and computational effort spent on a task based on its complexity.For simple requests, it's fast and efficient, but for complex refactoring projects, it can work for hours.One engineer during the technical session noted, "I've seen the GPT-5-Codex model work for over seven hours productively... on a marathon session." This capability to handle long-running, complex tasks is a significant leap beyond the simple, single-shot interactions that define most AI coding assistants.The results inside OpenAI have been dramatic. The company reported that 92% of its technical staff now uses Codex daily, and those engineers complete 70% more pull requests (a measure of code contribution) each week. Usage has surged tenfold since August. "When we as a team see the stats, it feels great," Sottiaux shared. "But even better is being at lunch with someone who then goes 'Hey I use Codex all the time. Here's a cool thing that I do with it. Do you want to hear about it?'" How OpenAI uses Codex to build its own AI products and catch hundreds of bugs dailyPerhaps the most compelling argument for Codex’s importance is that it is the foundational layer upon which OpenAI’s other flashy announcements were built. During the DevDay event, the company showcased custom-built arcade games and a dynamic, AI-powered website for the conference itself, all developed using Codex.In one session, engineers demonstrated how they built "Storyboard," a custom creative tool for the film industry, in just 48 hours during an internal hackathon. "We decided to test Codex, our coding agent... we would send tasks to Codex in between meetings. We really easily reviewed and merged PRs into production, which Codex even allowed us to do from our phones," said Allison August, a solutions engineering leader at OpenAI. This reveals a critical insight: the rapid innovation showcased at DevDay is a direct result of the productivity flywheel created by Codex. The AI is a core part of the manufacturing process for all other AI products.A key enterprise-focused feature is the new, more robust code review capability. OpenAI said it "purposely trained GPT-5-Codex to be great at ultra thorough code review," enabling it to explore dependencies and validate a programmer's intent against the actual implementation to find high-quality bugs.Internally, nearly every pull request at OpenAI is now reviewed by Codex, catching hundreds of issues daily before they reach a human reviewer."It saves you time, you ship with more confidence," Sottiaux said. "There's nothing worse than finding a bug after we actually ship the feature." Why enterprise software teams are choosing Codex over GitHub Copilot for mission-critical developmentThe maturation of Codex is central to OpenAI’s broader strategy to conquer the enterprise market, a move essential to justifying its massive valuation and unprecedented compute expenditures. During a press conference, CEO Sam Altman confirmed the strategic shift."The models are there now, and you should expect a huge focus from us on really winning enterprises with amazing products, starting here," Altman said during a private press conference. OpenAI President and Co-founder Greg Brockman immediately added, "And you can see it already with Codex, which I think has been just an incredible success and has really grown super fast." For technical decision-makers, the message is clear. While consumer-facing agents that book dinner reservations are still finding their footing, Codex is a proven enterprise agent delivering substantial ROI today. Companies like Cisco have already rolled out Codex to their engineering organizations, cutting code review times by 50% and reducing project timelines from weeks to days.With the new Codex SDK, companies can now embed this agentic power directly into their own custom workflows, such as automating fixes in a CI/CD pipeline or even creating self-evolving applications. During a live demo, an engineer showcased a mobile app that updated its own user interface in real-time based on a natural language prompt, all powered by the embedded Codex SDK. While the launch of an app ecosystem in ChatGPT and the breathtaking visuals of the Sora 2 API rightfully generated headlines, the general availability of Codex marks a more fundamental and immediate transformation. It is the quiet but powerful engine driving the next era of software development, turning the abstract promise of AI-driven productivity into a tangible, deployable reality for businesses today.

#ai

Presented by ZendeskZendesk powers nearly 5 billion resolutions every year for over 100,000 customers around the world, with about 20,000 of its customers (and growing) using its AI services. Zendesk is poised to generate about $200 million in AI-related revenue this year, double than some of its largest competitors, while investing $400 million dollars in R&D. Much of that research is focused on upgrading the Zendesk Resolution Platform, a complete AI-first solution for customer service, employee service, and contact center teams, announced at Relate this past March.During AI Summit, Chief Executive Officer Tom Eggemeier, along with members of the Zendesk team, took to the stage to announce several major advancements, including voice AI agents, video calling, and screen sharing for Zendesk Contact Center, and improved IT asset management, as well as the introduction of next-generation analytics, in the wake of its acquisition of HyperArc."We have built the only platform that is purpose-built for service and purpose-built for AI," Eggemeier said. "That focus is why we lead in AI for all types of service. And it is why we can deliver what no one else can for every service need you have in your organization."New capabilities across use cases and companiesAt its core, the Resolution Platform powers autonomous AI agents that solve complex issues in real time, leveraging leading LLMs like GPT-5, developed in collaboration with OpenAI, and supporting Model Context Protocol (MCP) to instantly access data, which streamlines workflows and improves autonomous problem-solving. "Since our launch in March, we’ve been building fast, focused on making AI agents smarter, more flexible, and ready for even more channels," said Shashi Upadhyay, president of product, engineering, and AI at Zendesk. "And now, these AI agents are getting even better. They work across messaging, email, and now voice. They are getting smarter; able to handle multiple intents in a single message, detecting, remembering, and resolving many issues at once."The only platform with native built-in QA, resolutions are automatically scored down to the conversation level, so teams can track resolution quality at scale. For startups, these insights are critical. They not only show what worked, but what needs fixing before it costs them time, reputation, or growth, and importantly, fit within a startup budget. That’s because Zendesk is the only company that charges only for successful resolutions, which are verified through the industry’s longest validation window, with two layers of quality checks.Making the product CX admin a hero Zendesk demonstrated the platform’s new features by highlighting a hypothetical wearable device company’s product launch. Service leaders at every stop along the product launch journey — from design to manufacturing — manage emerging issues with the support of the upgraded Resolution Platform.For a global manufacturer that builds complex, state-of-the-art wearable tech, the pressure starts the moment a new product hits the market, tickets start pouring in, and a red-flagged backlog piles up. "It is not a product issue, it is a resolution bottleneck," Upadhyay said. But, he added, "What once took days can now be resolved instantly." The new Zendesk Admin Copilot is designed specifically to assist human agents, helping them spot what is not working, what to do next, and carry out changes quickly. It flags operational issues, like missing intent tags, broken internal processes, or routing conflicts that delay resolution. Copilot explains what is happening in plain language, recommends specific fixes, and with the admin’s approval, can make the changes itself. It's grounded in live Zendesk data, like tickets, triggers, and knowledge, so every recommendation is specific, current, and based on how the service operation actually runs. Once the admin identifies the issue and implements a fix, the next step is ensuring everyone has access to the right knowledge to support it. For many organizations, that information lives outside of Zendesk. The newly launched Knowledge Connectors allows admins to pull in relevant content, like configuration guides or policy details, without needing to migrate anything so both human and AI agents have access to real-time instructions tied to the exact product version. The admin also creates a smarter feedback loop with the new Action Builder, which automatically tags, summarizes, and sends notifications to the product team through Microsoft Teams. And finally, Zendesk HyperArc will bring customers insights that combine AI and human analysis in a clear, narrative-driven view of what is happening and why, instead of siloed dashboards or static reports."With these innovations in place, change at the manufacturing plant cascades quickly, tickets are routed cleanly, support agents know what to say, engineering sees real signals instead of scattered anecdotes, and customers who just want the product to work get fast, reliable resolutions," Upadhyay said. "The CX Admin becomes the quiet hero of the manufacturer’s story."Solutions for the retail CX leaderAs a CX or contact center leader for a retail company, when a must-have wearable drops, how do you deliver service for your new hit product that feels personal and consistent when your team is stretched across multiple countries, channels, and customer expectations at once? "Intelligent automation doesn’t just streamline operations — it enhances the customer experience across borders and channels," said Lisa Kant, senior vice president of marketing at Zendesk. Zendesk’s Voice AI Agents are fully autonomous AI agents designed to understand natural speech, take action, and resolve issues without needing to escalate. They can verify identity, track orders, update deliveries, and answer setup questions in multiple languages, while keeping the brand experience consistent. Meanwhile, Video Calling lets a live agent spin up a video session, confirm the device is working, and walk the customer through setup or troubleshooting. And because a help center is a critical part of delivering great service, especially when scaling fast across multiple countries and languages, Zendesk built Knowledge Builder, an AI-powered tool that helps teams build and maintain their help center content automatically. It analyzes real customer conversations and turns them into localized help articles for trending issues.Giving IT leaders a strong edge When a company adopts that new product, it becomes critical to resolve issues fast, to ensure employee productivity stays strong. Available with early access in November, Zendesk's new employee service offering, IT Asset Management (ITAM), natively integrates service and asset data together into the Zendesk service desk to help IT move from reactive troubleshooting to proactive service. Now, when a vague “tablet not working” ticket comes in, Zendesk ITAM surfaces the device details right inside the ticket, so IT knows exactly what they are dealing with. Zendesk Copilot uses that same asset data to recommend model-specific troubleshooting steps. And with Knowledge Connectors, those steps can be pulled directly from SharePoint or Confluence without migration. If the fix does not work, the IT specialist confirms in seconds that the device is under warranty and issues a replacement without any back-and-forth. With real-time visibility across every hardware asset, the IT leader can spot patterns before they become a flood of tickets, or failures at the point of care, so IT resolves issues faster and prevents problems before they happen. "With Zendesk, IT is not just reacting to issues — it is setting the standard for how proactive employee service is delivered," Upadhyay said.For more on the latest Zendesk updates and improvements, and to watch a conversation with Zendesk's special guest, co-founder of LinkedIn, Reid Hoffman, and more, watch the full videos here. And for the latest updates, detailed information, and product availability, visit Zendesk’s official announcements page. Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

#ai

Presented by CertiniaEvery professional services leader knows the feeling: a pipeline full of promising deals, but a bench that’s already stretched thin. That’s because growth has always been tied to a finite supply of consultants with finite availability to work on projects. Even with strong market demand, most firms only capture 10-20% of their potential pipeline because they simply can’t staff the work fast enough. Professional Services Automation (PSA) software emerged to help optimize operations, but the core model has remained the same.Thankfully, that limitation is about to change. The proliferation of AI agents is sparking a new model — Autonomous PSA — blending human expertise with a digital workforce, all managed by a central orchestration engine. The result is a system that allows firms to capture 70-90% of demand instead of leaving it on the table. Why professional services has the biggest transformation opportunity with agentic AIMany industries will be transformed by AI agents, but perhaps none more than professional services. Understanding why requires us to explore the difference between current-state automation and future-state autonomy.Traditional automation follows pre-set rules: When X happens, do Y. It's a logical workflow. Autonomy, on the other hand, is goal-oriented: The goal is Z. Analyze the data, select and deploy the best resources, and execute the necessary steps to achieve Z. It’s the difference between executing a workflow, and executing a full-on strategy.This distinction is key because the core operation of a professional services business is a complex strategy. Unlike a sales team managing a linear pipeline or a support team clearing a reactive queue, a services firm is constantly solving a multi-dimensional problem. The "product" isn't a license or a physical item; it's the expertise of its people, working on a diverse set of tasks, typically delivered over discrete units of time.That means the business model of a services organization contains layers of operational complexity that product-based businesses inherently get to avoid. The manual effort and guesswork involved often lead to conservative bidding on new business, underutilized experts, and reactive staffing that can put project margins and timelines at risk. Added up, this complexity represents a trillion-dollar opportunity cost for the global services economy.The orchestration engine that makes autonomous PSA possible“Autonomous PSA” describes an intelligent system designed to manage and orchestrate a blended team of human experts and their AI agent counterparts. It works by integrating a digital workforce of AI agents directly into your service delivery operations, providing a nearly limitless supply of labor for repeatable tasks, all governed by a single engine. It's a fundamental shift from a model constrained by human supply to one amplified by digital scale.There is one enterprise software ecosystem uniquely positioned to make Autonomous PSA possible: Salesforce. Autonomous PSA emerges from the combination of three of its core technologies:The Salesforce platform as the foundation: Everything will start with a single source of truth. The Salesforce platform provides the unified data fabric for every aspect of the customer relationship. This foundation extends across the entire platform, giving the autonomous engine the complete data context it needs to function. Agentforce as the AI engine: Agentforce represents the industry’s most secure, trusted layer for building and deploying AI agents that provide digital labor. It gives organizations the power to execute complex tasks at scale, transforming AI capabilities from concept to a tangible part of the future resource pool. Salesforce-native Professional Services Automation software as the orchestration brain: The data foundation and AI engine need a command center. A Salesforce-native solution for Professional Services Automation like Certinia acts as the orchestration brain that defines the goals, rules, and workflows for the agents, deploying them alongside human resources to optimize project outcomes from sale to delivery.The keystone of this new model is the orchestration brain, akin to a control tower for the hybrid human-AI agent workforce. It’s a system built to manage an elastic supply of resources, instantly scaling delivery by pairing consultants with digital agents. Instead of scrambling with spreadsheets, staffing becomes a real-time, AI-driven allocation based on skills, availability, and project needs.The combination creates a unified platform that gives the orchestration engine the context it needs for smarter, faster decision-making across the entire project lifecycle.For executives, the impact is direct. Now empowered to overcome human capacity limits, PSOs can expand pipeline capture from a mere 10–20% to as high as 70–90%. This growth is also more profitable, as margins improve when lower-value work is offloaded to digital labor, allowing people to focus on high-value delivery. Furthermore, project timelines are accelerated, with 24/7 AI capacity shortening schedules and speeding time-to-value. Crucially, this speed and efficiency do not come at the expense of quality; human oversight remains embedded in every engagement, ensuring client trust is maintained through strong governance.Preparing your organization for autonomous PSAAdapting to Autonomous Professional Services requires leadership and foresight. For organizations ready to start, the journey begins with three key steps:Re-architect your workforce model. The traditional pyramid workforce hierarchy is shifting to a diamond structure with AI agents handling the base of repeatable work. This will create new roles like orchestration analysts and agent supervisors to manage this blended workforce. Your first move is to audit your delivery processes and identify the high-volume, low-complexity tasks primed for this new digital workforce.Invest in a native orchestration engine. An autonomous system needs a central brain. This is your PSA solution, and it must be native to your CRM platform to access real-time data across sales, service, and finance. If your project, resource, and financial data live in different systems, your priority is to unify them on a single platform to create the foundation for intelligent decision-making.Experiment, then scale. Don't try to transform everything at once. Start by automating a single, high-friction process, like project creation from a closed-won opportunity or initial budget drafting. Proving value on a small scale builds the business case and the operational muscle for a systematic expansion across your entire services lifecycle.Model behind the trillion-dollar ppportunity Our analysis from over 2000 global professional services organizations indicates that firms today leave most of their pipeline untouched. With human capacity alone, they typically capture only 10–20% of qualified demand. By blending digital labor into the mix, that capacity can rise to 70–90%. The difference—what we call ΔR—is massive. For a large professional services organization (PSO) with a $6B pipeline, that shift alone unlocks about $3.6B in incremental revenue.And that is just the starting point. Once you add amplifiers like faster delivery (acceleration), lower delivery cost (margin gains), and access to niche expertise (skill-gap coverage), the impact multiplies. In our model, those amplifiers nearly triple the base gain, raising the total opportunity to $10 Billion per firm. Scale that across 100 of the world’s largest PSOs, and you arrive at the trillion-dollar prize.Seize the full market potentialThe idea presented here represents a once-in-a-generation opportunity to redefine the economics of professional services. Firms that adopt Autonomous PSA will capture a greater share of demand, deliver faster outcomes, and free their experts to focus on what matters most: client success.The era of Autonomous Professional Services has begun. The orchestration engine is the key. How quickly will your organization seize the opportunity?The full framework and analytical model are detailed in this new white paper, Unlocking a Trillion Dollar Opportunity for Professional Services with Autonomous PSA. I encourage you to download it and explore how your organization can prepare for this shift.Raju Malhotra is Chief Product & Technology Officer at Certinia.Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

#ai

The friction of having to open a separate chat window to prompt an agent could be a hassle for many enterprises. And AI companies are seeing an opportunity to bring more and more AI services into one platform, even integrating into where employees do their work. OpenAI’s ChatGPT, although still a separate window, is gradually introducing more integrations into its platform. Rivals like Google and Amazon Web Services believe they can compete with new platforms directly aiming at enterprise users who want a more streamlined AI experience. And these two new platforms are the latest volley in the race to bring enterprise AI users into one central place for their AI needs. Google and AWS are separately introducing new platforms designed for full-stack agent workflow, hoping to usher in a world where users don’t need to open other windows to access agents. Google unveiled Gemini Enterprise, a platform that Google Cloud CEO Thomas Kurian said “brings the best of Google AI to every employee.” Meanwhile, AWS announced Quick Suite, a series of services intended to exist as a browser extension for enterprises to call on agents. Both these platforms aim to keep enterprise employees working within one ecosystem, keeping the needed context in more local storage. Quick SuiteAWS, through Bedrock, allowed enterprises to build applications and agents, test them, and then deploy them in a single space. However, Bedrock remains a backend tool. AWS is banking that organizations will want a better way to access those agents without leaving their workspace. Quick Suite will be AWS’s front-facing agentic application for enterprises. It will also be a browser extension for Chrome and Firefox and accessible on Microsoft Outlook, Word and Slack. AWS vice president for Agentic AI Swami Sivasubramanian said Quick Suite is the company’s way of “entering a new era of work,” in that it gives employees access to AI applications they like with privacy considerations and context from their enterprise data. Quick Suite connects with Adobe Analytics, SharePoint, Snowflake, Google Drive, OneDrive, Outlook, Salesforce, ServiceNow, Slack, Databricks, Amazon Redshift, and Amazon S3. Through MCP servers, users can also access information from Atlassian, Asana, Box, Canva, PagerDuty, Workato or Zapier. The platform consists of several services users can toggle to:An agent builder accessible through a chat assistant Quick Sight to analyze and visualize dataQuick Research, which can find information and build out research reports. Users can choose to limit the search to internal or uploaded documents only or to access the internetQuick Flows to allow people to build routine tasks through simple promptsQuick Automate for more complicated workflows, where the model can begin coordinating agents and data sharing to complete tasksAWS said it orchestrates through several foundation models to power Quick Suite’s services. Gemini EnterpriseGoogle had already begun offering enterprise AI solutions, often in fragmented products. Its newest offering, Gemini Enterprise, brings together the company’s AI offerings in a single place. Products like Gemini CLI and Google Vids will be integrated and accessible through Gemini Enterprise. “By bringing all of these components together through a single interface, Gemini Enterprise transforms how teams work,” Kurian said in a blog post.It is powered by Gemini models and connects to an enterprise’s data sources. Gemini always connected to Google’s Workspace services, such as Docs and Drive, but Gemini Enterprise can now grab information from Microsoft 365 or other platforms like Salesforce. The idea behind Gemini Enterprise is to offer “a no-code workbench” for any user to surface information and orchestrate agents for automation. The platform includes pre-built agents for deep research and insights, but customers can also bring in their own or third-party agents. Administrators can manage these agents and workflows through a visual governance framework within Gemini Enterprise. Google said some customers have already begun using Gemini Enterprise, including Macquarie Bank, legal AI provider Harvey and Banco BV. Google told VentureBeat that other platforms, like Vertex AI, remain separate products. Pricing for Gemini Enterprise, both the standard and pulse editions, start at $30 per seat per month. A new pricing tier, Gemini Business, costs $21/seat per month for a year. Uninterrupted work in one placeIn many ways, enterprise AI was always going to move to this more full-stack, end-to-end environment where people access all AI tools in one place. After all, fragmented offerings and lost context turn off many employees who already have a lot on their plates. Removing the friction of moving windows and possibly losing context to what you’re working on could save people a lot more time, and make the idea of using an AI agent or chatbot more appealing. This was the reasoning behind OpenAI’s decision to create a desktop app for ChatGPT and why we see so many product announcements around integrations. But now, competitors have to offer more differentiated platforms or they risk being labeled as copycats of products most people already use. I felt the same during a Quick Suite demo, thinking it felt like ChatGPT. The battle to be the one full-stack platform for the enterprise is just beginning. And as more AI tools and agents become more useful for employees, there will be more demand to make calling up these services as simple as a tap from their preferred workspace. 

« 1...3233343536...192»
×