Latest AI News & Updates

#data centers and infrastructure #company announcements #grow with google #ai

Google is investing an additional €5 billion in Belgium over the next two years to expand its cloud and AI infrastructure. This includes expansions of our data center ca…

#amazon bedrock #amazon bedrock knowledge bases #amazon machine learning #artificial intelligence #customer solutions #uncategorized

In this post, we show how Vxceed used Amazon Bedrock to develop this AI-powered multi-agent solution that generates personalized sales pitches for field sales teams at scale.

#amazon sagemaker #amazon sagemaker studio #amazon vpc #aws identity and access management (iam) #aws lambda #technical how-to #ai/ml #mlops #aws key management service #aws service catalog

Machine learning operations (MLOps) is the combination of people, processes, and technology to productionize ML use cases efficiently. To achieve this, enterprise customers must develop MLOps platforms to support reproducibility, robustness, and end-to-end observability of the ML use case’s lifecycle. Those platforms are based on a multi-account setup by adopting strict security constraints, development best […]

#collaboration #international initiatives #funding #research #artificial intelligence #technology and society #computer science and technology #electrical engineering and computer science (eecs) #school of engineering #mit schwarzman college of computing

The MIT–MBZUAI Collaborative Research Program will unite faculty and students from both institutions to advance AI and accelerate its use in pressing scientific and societal challenges.

#search #android #chrome #google deepmind #google labs #learning & education #gemini #ai #gemini app

Here are Google’s latest AI updates from September 2025

#data science #astronomy #deep dives #programming #python #geospatial analytics

A hands-on walkthrough using skyfield, timezonefinder, geopy, and pytz, and further practical applications
The post Know Your Real Birthday: Astronomical Computation and Geospatial-Temporal Analytics in Python appeared first on Towards Data Science.

Seaborn is a statistical visualization library for Python that sits on top of Matplotlib. It gives you clean defaults, tight integration with Pandas DataFrames, and high-level functions that reduce boilerplate.

#research #simulation #robotics #data #artificial intelligence #computer modeling #internet #machine learning #computer science and technology #mit schwarzman college of computing #computer science and artificial intelligence laboratory (csail) #school of engineering #electrical engineering and computer science (eecs)

New tool from MIT CSAIL creates realistic virtual kitchens and living rooms where simulated robots can interact with models of real-world objects, scaling up training data for robot foundation models.

#data visualization #data science #data storytelling #editors pick #visual design

A simple and powerful guide to using color for more impactful data stories.
The post Data Visualization Explained (Part 3): The Role of Color appeared first on Towards Data Science.

Get control of your data workflows with these essential CLI tools.

#ai

The latest addition to the small model wave for enterprises comes from AI21 Labs, which is betting that bringing models to devices will free up traffic in data centers. AI21’s Jamba Reasoning 3B, a “tiny” open-source model that can run extended reasoning, code generation and respond based on ground truth. Jamba Reasoning 3B handles more than 250,000 tokens and can run inference on edge devices. The company said Jamba Reasoning 3B works on devices such as laptops and mobile phones. Ori Goshen, co-CEO of AI21, told VentureBeat that the company sees more enterprise use cases for small models, mainly because moving most inference to devices frees up data centers.  “What we're seeing right now in the industry is an economics issue where there are very expensive data center build-outs, and the revenue that is generated from the data centers versus the depreciation rate of all their chips shows the math doesn't add up,” Goshen said. He added that in the future “the industry by and large would be hybrid in the sense that some of the computation will be on devices locally and other inference will move to GPUs.”Tested on a MacBook
Jamba Reasoning 3B combines the Mamba architecture and Transformers to allow it to run a 250K token window on devices. AI21 said it can do 2-4x faster inference speeds. Goshen said the Mamba architecture significantly contributed to the model’s speed. Jamba Reasoning 3B’s hybrid architecture also allows it to reduce memory requirements, thereby reducing its computing needs. AI21 tested the model on a standard MacBook Pro and found that it can process 35 tokens per second. Goshen said the model works best for tasks involving function calling, policy-grounded generation and tool routing. He said that simple requests, such as asking for information about a forthcoming meeting and asking the model to create an agenda for it, could be done on devices. The more complex reasoning tasks can be saved for GPU clusters. Small models in enterpriseEnterprises have been interested in using a mix of small models, some of which are specifically designed for their industry and some that are condensed versions of LLMs. In September, Meta released MobileLLM-R1, a family of reasoning models ranging from 140M to 950M parameters. These models are designed for math, coding and scientific reasoning rather than chat applications. MobileLLM-R1 can run on compute-constrained devices. Google’s Gemma was one of the first small models to come to the market, designed to run on portable devices like laptops and mobile phones. Gemma has since been expanded. Companies like FICO have also begun building their own models. FICO launched its FICO Focused Language and FICO Focused Sequence small models that will only answer finance-specific questions. Goshen said the big difference their model offers is that it’s even smaller than most models and yet it can run reasoning tasks without sacrificing speed. Benchmark testing In benchmark testing, Jamba Reasoning 3B demonstrated strong performance compared to other small models, including Qwen 4B, Meta’s Llama 3.2B-3B, and Phi-4-Mini from Microsoft. It outperformed all models on the IFBench test and Humanity’s Last Exam, although it came in second to Qwen 4 on MMLU-Pro. Goshen said another advantage of small models like Jamba Reasoning 3B is that they are highly steerable and provide better privacy options to enterprises because the inference is not sent to a server elsewhere. “I do believe there’s a world where you can optimize for the needs and the experience of the customer, and the models that will be kept on devices are a large part of it,” he said. 

#ai & ml #commentary

This article is part of a series on the Sens-AI Framework—practical habits for learning and coding with AI. Read the original framework introduction and explore the complete methodology in Andrew Stellman’s O’Reilly report Critical Thinking Habits for Coding with AI. Teaching developers to work effectively with AI means building habits that keep critical thinking active while leveraging AI’s […]

You've written Python that processes data in a loop.

Available in preview via the API, our Computer Use model is a specialized model built on Gemini 2.5 Pro’s capabilities to power agents that can interact with user interfaces.

AI speech, dating AI, CometJacking exploit, OpenAI + Nvidia, open Altman, and more...

#ai #programming & development #dev

In a packed hall at Fort Mason Center in San Francisco, against a backdrop of the Golden Gate Bridge, OpenAI CEO Sam Altman laid out a bold vision to remake the digital world. The company that brought generative AI to the mainstream with a simple chatbot is now building the foundations for its next act: a comprehensive computing platform designed to move beyond the screen and browser, with legendary designer Jony Ive enlisted to help shape its physical form.At its third annual DevDay, OpenAI unveiled a suite of tools that signals a strategic pivot from a model provider to a full-fledged ecosystem. The message was clear: the era of simply asking an AI questions is over. The future is about commanding AI to perform complex tasks, build software autonomously, and live inside every application, a transition Altman framed as moving from "systems that you can ask anything to, to systems that you can ask to do anything for you." The day’s announcements were a three-pronged assault on the status quo, targeting how users interact with software, how developers build it, and how businesses deploy intelligent agents. But it was the sessions held behind closed doors, away from the public livestream, that revealed the true scope of OpenAI’s ambition — a future that includes new hardware, a relentless pursuit of computational power, and a philosophical quest to redefine our relationship with technology.From chatbot to operating system: The new 'App Store'The centerpiece of the public-facing keynote was the transformation of ChatGPT itself. With the new Apps SDK, OpenAI is turning its wildly popular chatbot into a dynamic, interactive platform, effectively an operating system where developers can build and distribute their own applications.“Today, we're going to open up ChatGPT for developers to build real apps inside of ChatGPT,” Altman announced during the keynote presentation to applause. “This will enable a new generation of apps that are interactive, adaptive and personalized, that you can chat with.”Live demonstrations showcased apps from partners like Coursera, Canva, and Zillow running seamlessly within a chat conversation. A user could watch a machine learning lecture, ask ChatGPT to explain a concept in real-time, and then use Canva to generate a poster based on the conversation, all without leaving the chat interface. The apps can render rich, interactive UIs, even going full-screen to offer a complete experience, like exploring a Zillow map of homes.For developers, this represents a powerful new distribution channel. “When you build with the Apps SDK, your apps can reach hundreds of millions of chat users,” Altman said, highlighting a direct path to a massive user base that has grown to over 800 million weekly active users. In a private press conference later, Nick Turley, head of ChatGPT, elaborated on the grander vision. "We never meant to build a chatbot," he stated. "When we set out to make ChatGPT, we meant to build a super assistant and we got a little sidetracked. And one of the tragedies of getting a little sidetracked is that we built a great chatbot, but we are the first ones to say that not all software needs to be a chatbot, not all interaction with the commercial world needs to be a chatbot."Turley emphasized that while OpenAI is excited about natural language interfaces, "the interface really needs to evolve, which is why you see so much UI in the demos today. In fact, you can even go full screen and chat is in the background." He described a future where users might "start your day in ChatGPT, just because it kind of has become the de facto entry point into the commercial web and into a lot of software," but clarified that "our incentive is not to keep you in. Our product is to allow other people to build amazing businesses on top and to evolve the form factor of software."The rise of the agents: Building the 'do anything' AIIf apps are about bringing the world into ChatGPT, the new "Agent Kit" is about sending AI out into the world to get things done. OpenAI is providing a complete "set of building blocks... to help you take agents from prototype to production," Altman explained in his keynote. Agent Kit is an integrated development environment for creating autonomous AI workers. It features a visual canvas to design complex workflows, an embeddable chat interface ("Chat Kit") for deploying agents in any app, and a sophisticated evaluation suite to measure and improve performance.A compelling demo from financial operations platform Ramp showed how Agent Kit was used to build a procurement agent. An employee could simply type, "I need five more ChatGPT business seats," and the agent would parse the request, check it against company expense policies, find vendor details, and prepare a virtual credit card for the purchase — a process that once took weeks now completed in minutes. This push into agents is a direct response to a growing enterprise need to move beyond AI as a simple information retrieval tool and toward AI as a productivity engine that automates complex business processes. Brad Lightcap, OpenAI's COO, noted that for enterprise adoption, "you needed this kind of shift to more agentic AI that could actually do things for you, versus just respond with text outputs." The future of code and the Jony Ive bBombshellPerhaps the most profound shift is occurring in software development itself. Codex, OpenAI's AI coding agent, has graduated from a research preview to a full-fledged product, now powered by a specialized version of the new GPT-5 model. It is, as one speaker put it, "a teammate that understands your context." The capabilities are staggering. Developers can now assign Codex tasks directly from Slack, and the agent can autonomously write code, create pull requests, and even review other engineers' work on GitHub. A live demo showed Codex taking a simple photo of a whiteboard sketch and turning it into a fully functional, beautifully designed mobile app screen. Another demo showed an app that could "self-evolve," reprogramming itself in real-time based on a user's natural language request. But the day's biggest surprise came in a closing fireside chat, which was not livestreamed, between Altman and Jony Ive, the iconic former chief design officer of Apple. The two revealed they have been collaborating for three years on a new family of AI-centric hardware.Ive, whose design philosophy shaped the iPhone, iMac, and Apple Watch, said his creative team’s purpose "became clear" with the launch of ChatGPT. He argued that our current relationship with technology is broken and that AI presents an opportunity for a fundamental reset.“I think it would be absurd to assume that you could have technology that is this breathtaking, delivered to us through legacy products, products that are decades old,” Ive said. “I see it as a chance to use this most remarkable capability to full-on address a lot of the overwhelm and despair that people feel right now.”While details of the devices remain secret, Ive spoke of his motivation in deeply human terms. “We love our species, and we want to be useful. We think that humanity deserves much better than humanity generally is given,” he said. He emphasized the importance of "care" in the design process, stating, "We sense when people have cared... you sense carelessness. You sense when somebody does not care about you, they care about money and schedule." This collaboration confirms that OpenAI's ambitions are not confined to the cloud; it is actively exploring the physical interface through which humanity will interact with its powerful new intelligence.The Unquenchable Thirst for ComputeUnderpinning this entire platform strategy is a single, overwhelming constraint: the availability of computing power. In both the private press conference and the un-streamed Developer State of the Union, OpenAI’s leadership returned to this theme again and again.“The degree to which we are all constrained by compute... Everyone is just so constrained on being able to offer the services at the scale required to get the revenue that at this point, we're quite confident we can push it pretty far,” Altman told reporters. He added that even with massive new hardware partnerships with AMD and others, "we'll be saying the same thing again. We're so convinced... There's so much more demand." This explains the company’s aggressive, multi-billion-dollar investment in infrastructure. When asked about profitability, Altman was candid that the company is in a phase of "investment and growth." He invoked a famous quote from Walt Disney, paraphrasing, "We make more money so we can make more movies." For OpenAI, the "movies" are ever-more-powerful AI models.Greg Brockman, OpenAI’s President, put the ultimate goal in stark economic terms during the Developer State of the Union. "AI is going to become, probably in the not too distant future, the fundamental driver of economic growth," he said. "Asking ‘How much compute do you want?’ is a little bit like asking how much workforce do you want? The answer is, you can always get more out of more." As the day concluded and developers mingled at the reception, the scale of OpenAI's project came into focus. Fueled by new models like the powerful GPT-5 Pro and the stunning Sora 2 video generator, the company is no longer just building AI. It is building the world where AI will live — a world of intelligent apps, autonomous agents, and new physical devices, betting that in the near future, intelligence itself will be the ultimate platform.

#ai

Some of the largest providers of large language models (LLMs) have sought to move beyond multimodal chatbots — extending their models out into "agents" that can actually take more actions on behalf of the user across websites. Recall OpenAI's ChatGPT Agent (formerly known as "Operator") and Anthropic's Computer Use, both released over the last two years. Now, Google is getting into that same game as well. Today, the search giant's DeepMind AI lab subsidiary unveiled a new, fine-tuned and custom-trained version of its powerful Gemini 2.5 Pro LLM known as "Gemini 2.5 Pro Computer Use," which can use a virtual browser to surf the web on your behalf, retrieve information, fill out forms, and even take actions on websites — all from a user's single text prompt."These are early days, but the model’s ability to interact with the web – like scrolling, filling forms + navigating dropdowns – is an important next step in building general-purpose agents," said Google CEO Sundar Pichai, as part of a longer statement on the social network, X.The model is not available for consumers directly from Google, though.Instead, Google partnered with another company, Browserbase, founded by former Twilio engineer Paul Klein in early 2024, which offers virtual "headless" web browser specifically for use by AI agents and applications. (A "headless" browser is one that doesn't require a graphical user interface, or GUI, to navigate the web, though in this case and others, Browserbase does show a graphical representation for the user). Users can demo the new Gemini 2.5 Computer Use model directly on Browserbase here and even compare it side-by-side with the older, rival offerings from OpenAI and Anthropic in a new "Browser Arena" launched by the startup (though only one additional model can be selected alongside Gemini at a time).For AI builders and developers, it's being made as a raw, albeit propreitary LLM through the Gemini API in Google AI Studio for rapid prototyping, and Google Cloud's Vertex AI model selector and applications building platform. The new offering builds on the capabilities of Gemini 2.5 Pro, released back in March 2025 but which has been updated significantly several times since then, with a specific focus on enabling AI agents to perform direct interactions with user interfaces, including browsers and mobile applications.Overall, it appears Gemini 2.5 Computer Use is designed to let developers create agents that can complete interface-driven tasks autonomously — such as clicking, typing, scrolling, filling out forms, and navigating behind login screens. Rather than relying solely on APIs or structured inputs, this model allows AI systems to interact with software visually and functionally, much like a human would.Brief User Hands-On TestsIn my brief, unscientific initial hands-on tests on the Browserbase website, Gemini 2.5 Computer Use successfully navigate to Taylor Swift's official website as instructed and provided me a summary of what was being sold or promoted at the top — a special edition of her newest album, "The Life of A Showgirl."In another test, I asked Gemini 2.5 Computer Use to search Amazon for highly rated and well-reviewed solar lights I could stake into my back yard, and I was delighted to watch as it successfully completed a Google Search Captcha designed to weed out non-human users ("Select all the boxes with a motorcycle.") It did so in a matter of seconds.However, once it got through there, it stalled and was unable to complete the task, despite serving up a "task competed" message. I should also note here that while the ChatGPT agent from OpenAI and Anthropic's Claude can create and edit local files — such as PowerPoint presentations, spreadsheets, or text documents — on the user’s behalf, Gemini 2.5 Computer Use does not currently offer direct file system access or native file creation capabilities. Instead, it is designed to control and navigate web and mobile user interfaces through actions like clicking, typing, and scrolling. Its output is limited to suggested UI actions or chatbot-style text responses; any structured output like a document or file must be handled separately by the developer, often through custom code or third-party integrations.Performance BenchmarksGoogle says Gemini 2.5 Computer Use has demonstrated leading results in multiple interface control benchmarks, particularly when compared to other major AI systems including Claude Sonnet and OpenAI’s agent-based models. Evaluations were conducted via Browserbase and Google’s own testing.Some highlights include:Online-Mind2Web (Browserbase): 65.7% for Gemini 2.5 vs. 61.0% (Claude Sonnet 4) and 44.3% (OpenAI Agent)WebVoyager (Browserbase): 79.9% for Gemini 2.5 vs. 69.4% (Claude Sonnet 4) and 61.0% (OpenAI Agent)AndroidWorld (DeepMind): 69.7% for Gemini 2.5 vs. 62.1% (Claude Sonnet 4); OpenAI's model could not be measured due to lack of accessOSWorld: Currently not supported by Gemini 2.5; top competitor result was 61.4%In addition to strong accuracy, Google reports that the model operates at lower latency than other browser control solutions — a key factor in production use cases like UI automation and testing.How It WorksAgents powered by the Computer Use model operate within an interaction loop. They receive:A user task promptA screenshot of the interfaceA history of past actionsThe model analyzes this input and produces a recommended UI action, such as clicking a button or typing into a field. If needed, it can request confirmation from the end user for riskier tasks, such as making a purchase.Once the action is executed, the interface state is updated and a new screenshot is sent back to the model. The loop continues until the task is completed or halted due to an error or a safety decision.The model uses a specialized tool called computer_use, and it can be integrated into custom environments using tools like Playwright or via the Browserbase demo sandbox.Use Cases and AdoptionAccording to Google, teams internally and externally have already started using the model across several domains:Google’s payments platform team reports that Gemini 2.5 Computer Use successfully recovers over 60% of failed test executions, reducing a major source of engineering inefficiencies.Autotab, a third-party AI agent platform, said the model outperformed others on complex data parsing tasks, boosting performance by up to 18% in their hardest evaluations.Poke.com, a proactive AI assistant provider, noted that the Gemini model often operates 50% faster than competing solutions during interface interactions.The model is also being used in Google’s own product development efforts, including in Project Mariner, the Firebase Testing Agent, and AI Mode in Search.Safety MeasuresBecause this model directly controls software interfaces, Google emphasizes a multi-layered approach to safety:A per-step safety service inspects every proposed action before execution.Developers can define system-level instructions to block or require confirmation for specific actions.The model includes built-in safeguards to avoid actions that might compromise security or violate Google’s prohibited use policies.For example, if the model encounters a CAPTCHA, it will generate an action to click the checkbox but flag it as requiring user confirmation, ensuring the system does not proceed without human oversight.Technical CapabilitiesThe model supports a wide array of built-in UI actions such as:click_at, type_text_at, scroll_document, drag_and_drop, and moreUser-defined functions can be added to extend its reach to mobile or custom environmentsScreen coordinates are normalized (0–1000 scale) and translated back to pixel dimensions during executionIt accepts image and text input and outputs text responses or function calls to perform tasks. The recommended screen resolution for optimal results is 1440x900, though it can work with other sizes.API Pricing Remains Almost Identical to Gemini 2.5 ProThe pricing for Gemini 2.5 Computer Use aligns closely with the standard Gemini 2.5 Pro model. Both follow the same per-token billing structure: input tokens are priced at $1.25 per one million tokens for prompts under 200,000 tokens, and $2.50 per million tokens for prompts longer than that. Output tokens follow a similar split, priced at $10.00 per million for smaller responses and $15.00 for larger ones.Where the models diverge is in availability and additional features. Gemini 2.5 Pro includes a free tier that allows developers to use the model at no cost, with no explicit token cap published, though usage may be subject to rate limits or quota constraints depending on the platform (e.g. Google AI Studio). This free access includes both input and output tokens. Once developers exceed their allotted quota or switch to the paid tier, standard per-token pricing applies.In contrast, Gemini 2.5 Computer Use is available exclusively through the paid tier. There is no free access currently offered for this model, and all usage incurs token-based charges from the outset.Feature-wise, Gemini 2.5 Pro supports optional capabilities like context caching (starting at $0.31 per million tokens) and grounding with Google Search (free for up to 1,500 requests per day, then $35 per 1,000 additional requests). These are not available for Computer Use at this time.Another distinction is in data handling: output from the Computer Use model is not used to improve Google products in the paid tier, while free-tier usage of Gemini 2.5 Pro contributes to model improvement unless explicitly opted out.Overall, developers can expect similar token-based costs across both models, but they should consider tier access, included capabilities, and data use policies when deciding which model fits their needs.

#search #ai

Starting today, we’re bringing AI Mode in Google Search to more people around the world, launching in more than 35 new languages and over 40 new countries and territorie…

#gemini app #google one #google labs #ai

Google AI Plus — our newest AI plan — expands to 36 more countries.

#gemini models #google deepmind #ai

Today we are releasing the Gemini 2.5 Computer Use model via the API, which outperforms leading alternatives at browser and mobile tasks.

#ai

For more than a decade, conversational AI has promised human-like assistants that can do more than chat. Yet even as large language models (LLMs) like ChatGPT, Gemini, and Claude learn to reason, explain, and code, one critical category of interaction remains largely unsolved — reliably completing tasks for people outside of chat. Even the best AI models score only in the 30th percentile on Terminal-Bench Hard, a third-party benchmark designed to evaluate the performance of AI agents on completing a variety of browser-based tasks, far below the reliability demanded by most enterprises and users. And task-specific benchmarks like TAU-Bench airline, which measures the reliability of AI agents on finding and booking flights on behalf of a user, also don't have much higher pass rates, with only 56% for the top performing agents and models (Claude 3.7 Sonnet) — meaning the agent fails nearly half the time. New York City-based Augmented Intelligence (AUI) Inc., co-founded by Ohad Elhelo and Ori Cohen, believes it has finally come with a solution to boost AI agent reliability to a level where most enterprises can trust they will do as instructed, reliably. The company’s new foundation model, called Apollo-1 — which remains in preview with early testers now but is close to an impending general release — is built on a principle it calls stateful neuro-symbolic reasoning.It's a hybrid architecture championed by even LLM skeptics like Gary Marcus, designed to guarantee consistent, policy-compliant outcomes in every customer interaction.“Conversational AI is essentially two halves,” said Elhelo in a recent interview with VentureBeat. “The first half — open-ended dialogue — is handled beautifully by LLMs. They’re designed for creative or exploratory use cases. The other half is task-oriented dialogue, where there’s always a specific goal behind the conversation. That half has remained unsolved because it requires certainty.”AUI defines certainty as the difference between an agent that “probably” performs a task and one that almost “always” does. For example, on TAU-Bench Airline, it performs at a staggering 92.5% pass rate, leaving all the other current competitors far behind in the dust — according to benchmarks shared with VentureBeat and posted on AUI's website.Elhelo offered simple examples: a bank that must enforce ID verification for refunds over $200, or an airline that must always offer a business-class upgrade before economy. “Those aren’t preferences,” he said. “They’re requirements. And no purely generative approach can deliver that kind of behavioral certainty.”AUI and its work on improving reliability was previously covered by subscription news outlet The Information, but has not received widespread coverage in publicly accessible media — until now. From Pattern Matching to Predictable ActionThe team argues that transformer models, by design, can’t meet that bar. Large language models generate plausible text, not guaranteed behavior. “When you tell an LLM to always offer insurance before payment, it might — usually,” Elhelo said. “Configure Apollo-1 with that rule, and it will — every time.”That distinction, he said, stems from the architecture itself. Transformers predict the next token in a sequence. Apollo-1, by contrast, predicts the next action in a conversation, operating on what AUI calls a typed symbolic state.Cohen explained the idea in more technical terms. “Neuro-symbolic means we’re merging the two dominant paradigms,” he said. “The symbolic layer gives you structure — it knows what an intent, an entity, and a parameter are — while the neural layer gives you language fluency. The neuro-symbolic reasoner sits between them. It’s a different kind of brain for dialogue.”Where transformers treat every output as text generation, Apollo-1 runs a closed reasoning loop: an encoder translates natural language into a symbolic state, a state machine maintains that state, a decision engine determines the next action, a planner executes it, and a decoder turns the result back into language. “The process is iterative,” Cohen said. “It loops until the task is done. That’s how you get determinism instead of probability.”A Foundation Model for Task ExecutionUnlike traditional chatbots or bespoke automation systems, Apollo-1 is meant to serve as a foundation model for task-oriented dialogue — a single, domain-agnostic system that can be configured for banking, travel, retail, or insurance through what AUI calls a System Prompt.“The System Prompt isn’t a configuration file,” Elhelo said. “It’s a behavioral contract. You define exactly how your agent must behave in situations of interest, and Apollo-1 guarantees those behaviors will execute.”Organizations can use the prompt to encode symbolic slots — intents, parameters, and policies — as well as tool boundaries and state-dependent rules. A food delivery app, for example, might enforce “if allergy mentioned, always inform the restaurant,” while a telecom provider might define “after three failed payment attempts, suspend service.” In both cases, the behavior executes deterministically, not statistically.Eight Years in the MakingAUI’s path to Apollo-1 began in 2017, when the team started encoding millions of real task-oriented conversations handled by a 60,000-person human agent workforce. That work led to a symbolic language capable of separating procedural knowledge — steps, constraints, and flows — from descriptive knowledge like entities and attributes.“The insight was that task-oriented dialogue has universal procedural patterns,” said Elhelo. “Food delivery, claims processing, and order management all share similar structures. Once you model that explicitly, you can compute over it deterministically.”From there, the company built the neuro-symbolic reasoner — a system that uses the symbolic state to decide what happens next rather than guessing through token prediction.Benchmarks suggest the architecture makes a measurable difference. In AUI’s own evaluations, Apollo-1 achieved over 90 percent task completion on the τ-Bench-Airline benchmark, compared with 60 percent for Claude-4. It completed 83 percent of live booking chats on Google Flights versus 22 percent for Gemini 2.5-Flash, and 91 percent of retail scenarios on Amazon versus 17 percent for Rufus.“These aren’t incremental improvements,” said Cohen. “They’re order-of-magnitude reliability differences.”A Complement, Not a CompetitorAUI isn’t pitching Apollo-1 as a replacement for large language models, but as their necessary counterpart. In Elhelo’s words: “Transformers optimize for creative probability. Apollo-1 optimizes for behavioral certainty. Together, they form the complete spectrum of conversational AI.”The model is already running in limited pilots with undisclosed Fortune 500 companies across sectors including finance, travel, and retail. AUI has also confirmed a strategic partnership with Google and plans for general availability in November 2025, when it will open APIs, release full documentation, and add voice and image capabilities. Interested potential customers and partners can sign up to receive more information when it becomes available on AUI's website form.Until then, the company is keeping details under wraps. When asked about what comes next, Elhelo smiled. “Let’s just say we’re preparing an announcement,” he said. “Soon.”Toward Conversations That ActFor all its technical sophistication, Apollo-1’s pitch is simple: make AI that businesses can trust to act — not just talk. “We’re on a mission to democratize access to AI that works,” Cohen said near the end of the interview.Whether Apollo-1 becomes the new standard for task-oriented dialogue remains to be seen. But if AUI’s architecture performs as promised, the long-standing divide between chatbots that sound human and agents that reliably do human work may finally start to close.

#amazon nova #amazon quicksight #artificial intelligence #customer solutions #technical how-to #uncategorized #ai/ml

In this post, we demonstrate how Amazon Nova Act automates QuickSight data story creation, saving time so you can focus on making critical, data-driven business decisions.

#amazon eventbridge #artificial intelligence #aws lambda #generative ai

In this post, we demonstrated how a financial services company can use an FM to process large volumes of customer records and get specific data-driven product recommendations. We also showed how to implement an automated monitoring solution for Amazon Bedrock batch inference jobs. By using EventBridge, Lambda, and DynamoDB, you can gain real-time visibility into batch processing operations, so you can efficiently generate personalized product recommendations based on customer credit data.

#research #company announcements #ai

Google now celebrates five Nobel laureates, including three prizes in the past two years.

#profile #faculty #energy #artificial intelligence #machine learning #algorithms #renewable energy #electric grid #climate change #sustainability #computer science and technology #technology and policy #laboratory for information and decision systems (lids) #electrical engineering and computer science (eecs) #mit schwarzman college of computing #school of engineering

Assistant Professor Priya Donti’s research applies machine learning to optimize renewable energy.

#artificial intelligence #app

Kids have always played with and talked to stuffed animals. But now their toys can talk back, thanks to a wave of companies that are fitting children’s playthings with chatbots and voice assistants.  It’s a trend that has particularly taken off in China: A recent report by the Shenzhen Toy Industry Association and JD.com predicts…

In this article, I will give you examples of how I use statistics in my data science job, along with the resources I used to gain this knowledge.

#business #business / artificial intelligence

OpenAI revealed last week the custom AI tools it uses internally. The news sent some software companies into turmoil.

How to speed up exploratory data analysis with Python’s automated tools and get 80% of the insights in 20% of the time.

#large language models #artificial intelligence #data science #llm applications #programming #python

What took GPT-4o 2 hours to solve, Sonnet 4.5 does in 5 seconds 
The post This Puzzle Shows Just How Far LLMs Have Progressed in a Little Over a Year appeared first on Towards Data Science.

« 1...3334353637...191»
×