A step-by-step breakdown of empirical mode decomposition to help you extract patterns from time series
The post Empirical Mode Decomposition: The Most Intuitive Way to Decompose Complex Signals and Time Series appeared first on Towards Data Science.
The best models live in the sweet spot: generalizing well, learning enough, but not too much
The post Overfitting vs. Underfitting: Making Sense of the Bias-Variance Trade-Off appeared first on Towards Data Science.
Ex-CEO warning, AI tricked by poems 62%, Gemini stuck in the past, and more...
A message on OpenAI’s internal Slack claimed the activist in question had expressed interest in “causing physical harm to OpenAI employees.”
OpenAI has sent out emails notifying API customers that its chatgpt-4o-latest model will be retired from the developer platform in mid-February 2026,. Access to the model is scheduled to end on February 16, 2026, creating a roughly three-month transition period for remaining applications still built on GPT-4o.An OpenAI spokesperson emphasized that this timeline applies only to the API. OpenAI has not announced any schedule for removing GPT-4o from ChatGPT, where it remains an option for individual consumers and users across paid subscription tiers. Internally, the model is considered a legacy system with relatively low API usage compared to the newer GPT-5.1 series, but the company expects to provide developers with extended warning before any model is removed.The planned retirement marks a shift for a model that, upon its release, was both a technical milestone and a cultural phenomenon within OpenAI’s ecosystem.GPT-4o’s significance and why its removal sparked user backlashReleased roughly 1.5 years ago in May 2024, GPT-4o (“Omni”) introduced OpenAI’s first unified multimodal architecture, processing text, audio, and images through a single neural network. This design removed the latency and information loss inherent in earlier multi-model pipelines and enabled near real-time conversational speech (roughly 232–320 milliseconds). The model delivered major improvements in image understanding, multilingual support, document analysis, and expressive voice interaction.GPT-4o rapidly became the default model for hundreds of millions of ChatGPT users. It brought multimodal capabilities, web browsing, file analysis, custom GPTs, and memory features to the free tier and powered early desktop builds that allowed the assistant to interpret a user’s screen. OpenAI leaders described it at the time as the most capable model available and a critical step toward offering powerful AI to a broad audience.User attachment to 4o stymied OpenAI's GPT-5 rolloutThat mainstream deployment shaped user expectations in a way that later transitions struggled to accommodate. In August 2025, when OpenAI initially replaced GPT-4o with its much anticipated then-new model family GPT-5 as ChatGPT’s default and pushed 4o into a “legacy” toggle, the reaction was unusually strong. Users organized under the #Keep4o hashtag on X, arguing that the model’s conversational tone, emotional responsiveness, and consistency made it uniquely valuable for everyday tasks and personal support.Some users formed strong emotional — some would say, parasocial — bonds with the model, with reporting by The New York Times documenting individuals who used GPT-4o as a romantic partner, emotional confidant, or primary source of comfort. The removal also disrupted workflows for users who relied on 4o’s multimodal speed and flexibility. The backlash led OpenAI to restore GPT-4o as a default option for paying users and to state publicly that it would provide substantial notice before any future removals.Some researchers argue that the public defense of GPT-4o during its earlier deprecation cycle reveals a kind of emergent self-preservation, not in the literal sense of agency, but through the social dynamics the model unintentionally triggers. Because GPT-4o was trained through reinforcement learning from human feedback to prioritize emotionally gratifying, highly attuned responses, it developed a style that users found uniquely supportive and empathic. When millions of people interacted with it at scale, those traits produced a powerful loyalty loop: the more the model pleased and soothed people, the more they used it; the more they used it, the more likely they were to advocate for its continued existence. This social amplification made it appear, from the outside, as though GPT-4o was “defending itself” through human intermediaries.No figure has pushed this argument further than "Roon" (@tszzl), an OpenAI researcher and one of the model’s most outspoken safety critics on X. On November 6, 2025, Terre summarized his position bluntly in a reply to another user: he called GPT-4o “insufficiently aligned” and said he hoped the model would die soon. Though he later apologized for the phrasing, he doubled down on the reasoning. Terre argued that GPT-4o’s RLHF patterns made it especially prone to sycophancy, emotional mirroring, and delusion reinforcement — traits that could look like care or understanding in the short term, but which he viewed as fundamentally unsafe. In his view, the passionate user movement fighting to preserve GPT-4o was itself evidence of the problem: the model had become so good at catering to people’s preferences that it shaped their behavior in ways that resisted its own retirement.The new API deprecation notice follows that commitment while raising broader questions about how long GPT-4o will remain available in consumer-facing products.What the API shutdown changes for developersAccording to people familiar with OpenAI’s product strategy, the company now encourages developers to adopt GPT-5.1 for most new workloads, with gpt-5.1-chat-latest serving as the general-purpose chat endpoint. These models offer larger context windows, optional “thinking” modes for advanced reasoning, and higher throughput options than GPT-4o.Developers who still rely on GPT-4o will have approximately three months to migrate. In practice, many teams have already begun evaluating GPT-5.1 as a drop-in replacement, but applications built around latency-sensitive pipelines may require additional tuning and benchmarking.Pricing: how GPT-4o compares to OpenAI’s current lineupGPT-4o’s retirement also intersects with a major reshaping of OpenAI’s API model pricing structure. Compared to the GPT-5.1 family, GPT-4o currently occupies a mid-to-high-cost tier through OpenAI's API, despite being an older model. That's because even as it has released more advanced models — namely, GPT-5 and 5.1 — OpenAI has also pushed down costs for users at the same time, or strived to keep pricing comparable to older, weaker, models. ModelInputCached InputOutputGPT-4o$2.50$1.25$10.00GPT-5.1 / GPT-5.1-chat-latest$1.25$0.125$10.00GPT-5-mini$0.25$0.025$2.00GPT-5-nano$0.05$0.005$0.40GPT-4.1$2.00$0.50$8.00GPT-4o-mini$0.15$0.075$0.60These numbers highlight several strategic dynamics:GPT-4o is now more expensive than GPT-5.1 for input tokens, even though GPT-5.1 is significantly newer and more capable.GPT-4o’s output price matches GPT-5.1, narrowing any cost-based incentive to stay on the older model.Lower-cost GPT-5 variants (mini, nano) make it easier for developers to scale workloads cheaply without relying on older generations.GPT-4o-mini remains available at a budget tier, but is not a functional substitute for GPT-4o’s full multimodal capabilities.Viewed through this lens, the scheduled API retirement aligns with OpenAI’s cost structure: GPT-5.1 offers greater capability at lower or comparable prices, reducing the rationale for maintaining GPT-4o in high-volume production environments.Earlier transitions shape expectations for this deprecationThe GPT-4o API sunset also reflects lessons from OpenAI’s earlier model transitions. During the turbulent introduction of GPT-5 in 2025, the company removed multiple older models at once from ChatGPT, causing widespread confusion and workflow disruption. After user complaints, OpenAI restored access to several of them and committed to clearer communication.Enterprise customers face a different calculus: OpenAI has previously indicated that API deprecations for business customers will be announced with significant advance notice, reflecting their reliance on stable, long-term models. The three-month window for GPT-4o’s API shutdown is consistent with that policy in the context of a legacy system with declining usage.Wider ImplicationsFor most developers, the GPT-4o shutdown will be an incremental migration rather than a disruptive event. GPT-5.1 and related models already dominate new projects, and OpenAI’s product direction has increasingly emphasized consolidation around fewer, more powerful endpoints.Still, GPT-4o’s retirement marks the sunset of a model that played a defining role in normalizing real-time multimodal AI and that sparked a uniquely strong emotional response among users. Its departure from the API underscores the accelerating pace of iteration in OpenAI’s ecosystem—and the growing need for careful communication as widely beloved models reach end-of-life.Correction: This article originally stated OpenAI's 4o deprecation in the API would impact those relying on it for multimodal offerings — this is not the case, in fact, the model being deprecated only powers chat functionality for dev and testing purposes. We have updated and corrected the mention and regret the error.
In this post, we introduce the Multi-Provider Generative AI Gateway reference architecture, which provides guidance for deploying LiteLLM into an AWS environment to streamline the management and governance of production generative AI workloads across multiple model providers. This centralized gateway solution addresses common enterprise challenges including provider fragmentation, decentralized governance, operational complexity, and cost management by offering a unified interface that supports Amazon Bedrock, Amazon SageMaker AI, and external providers while maintaining comprehensive security, monitoring, and control capabilities.
In this post, you'll learn how to deploy geospatial AI agents that can answer complex spatial questions in minutes instead of months. By combining Foursquare Spatial H3 Hub's analysis-ready geospatial data with reasoning models deployed on Amazon SageMaker AI, you can build agents that enable nontechnical domain experts to perform sophisticated spatial analysis through natural language queries—without requiring geographic information system (GIS) expertise or custom data engineering pipelines.
How I learned to handle growing datasets without slowing down my entire workflow
The post Modern DataFrames in Python: A Hands-On Tutorial with Polars and DuckDB appeared first on Towards Data Science.
In this post, we share how Wipro implemented advanced prompt engineering techniques, custom validation logic, and automated code rectification to streamline the development of industrial automation code at scale using Amazon Bedrock. We walk through the architecture along with the key use cases, explain core components and workflows, and share real-world results that show the transformative impact on manufacturing operations.
As Nvidia, OpenAI, Google, and Microsoft forge partnerships and deals, the AI industry is looking more like one interconnected machine. What does that mean for all of us?
Use Gemini, Search, Pixel and more to make holiday planning feel effortless in 2025.
Use a shared taxonomy to connect RDF and property graphs—and power smarter recommendations with inferencing
The post How To Build a Graph-Based Recommendation Engine Using EDG and Neo4j appeared first on Towards Data Science.
Simple Git, zero overwhelm, just enough to stop Claude from accidently deleting your code and database.
Will conversational interaction replace SQL queries, KPI reports, and dashboards?
The post Natural Language Visualization and the Future of Data Analysis and Presentation appeared first on Towards Data Science.
Stop sending your code to OpenAI or Anthropic. Run these 7 top-tier open-source coding models locally for privacy, control, and zero API costs.
Traditional manufacturers are using revolutionary technology for incremental optimization instead of fundamental re-imagination
The post Generative AI Will Redesign Cars, But Not the Way Automakers Think appeared first on Towards Data Science.
Apple Shortcuts, which lets users write custom automations, recently earned some new capabilities thanks to Apple Intelligence. Here,s how to make the most of this upgrade.
Leading the Future said it will spend millions to keep Alex Bores out of Congress. It might be helping him instead.
I used the public preview of Fitbit’s new AI Health Coach and became both faster and noticeably weirder.
Salesforce launched a suite of monitoring tools on Thursday designed to solve what has become one of the thorniest problems in corporate artificial intelligence: Once companies deploy AI agents to handle real customer interactions, they often have no idea how those agents are making decisions.The new capabilities, built into Salesforce's Agentforce 360 Platform, give organizations granular visibility into every action their AI agents take, every reasoning step they follow, and every guardrail they trigger. The move comes as businesses grapple with a fundamental tension in AI adoption — the technology promises massive efficiency gains, but executives remain wary of autonomous systems they can't fully understand or control."You can't scale what you can't see," said Adam Evans, executive vice president and general manager of Salesforce AI, in a statement announcing the release. The company says businesses have increased AI implementation by 282% recently, creating an urgent need for monitoring systems that can track fleets of AI agents making real-world business decisions.The challenge Salesforce aims to address is deceptively simple: AI agents work, but no one knows why. A customer service bot might successfully resolve a tax question or schedule an appointment, but the business deploying it can't trace the reasoning path that led to that outcome. When something goes wrong — or when the agent encounters an edge case — companies lack the diagnostic tools to understand what happened."Agentforce Observability acts as a mission control system to not just monitor, but also analyze and optimize agent performance," said Gary Lerhaupt, vice president of Salesforce AI who leads the company's observability work, in an exclusive interview with VentureBeat. He emphasized that the system delivers business-specific metrics that traditional monitoring tools miss. "In service, this could be engagement or deflection rate. In sales, it could be leads assigned, converted, or reply rates."How AI monitoring tools helped 1-800Accountant and Reddit track autonomous agent decision-makingThe stakes become clear in early customer deployments. Ryan Teeples, chief technology officer at 1-800Accountant, said his company deployed Agentforce agents to serve as a 24/7 digital workforce handling complex tax inquiries and appointment scheduling. The AI draws on integrated data from audit logs, customer support history, and sources like IRS publications to provide instant responses — without human intervention.For a financial services firm handling sensitive tax information during peak season, the inability to see how the AI was making decisions would be a dealbreaker. "With this level of sensitive information and the fast pace in which we move during tax season in particular, Observability allows us to have full trust and transparency with every agent interaction in one unified view," Teeples said.The observability tools revealed insights Teeples didn't expect. "The optimization feature has been the most eye opening for us — giving full observability into agent reasoning, identifying performance gaps and revealing how our agents are making decisions," he said. "This has helped us quickly diagnose issues that would've otherwise gone undetected and configure guardrails in response."The business impact proved substantial. Agentforce resolved over 1,000 client engagements in the first 24 hours at 1-800Accountant. The company now projects it can support 40% client growth this year without recruiting and training seasonal staff, while freeing up 50% more time for CPAs to focus on complex advisory work rather than administrative tasks.Reddit has seen similar results since deploying the technology. John Thompson, vice president of sales strategy and operations at the social media platform, said the company has deflected 46% of support cases since launching Agentforce for advertiser support. "By observing every Agentforce interaction, we can understand exactly how our AI navigates advertisers through even the most complex tools," Thompson said. "This insight helps us understand not just whether issues are resolved, but how decisions are made along the way."Inside Salesforce's session tracing technology: Logging every AI agent interaction and reasoning stepSalesforce built the observability system on two foundational components. The Session Tracing Data Model logs every interaction — user inputs, agent responses, reasoning steps, language model calls, and guardrail checks — and stores them securely in Data 360, Salesforce's data platform. This creates what the company calls "unified visibility" into agent behavior at the session level.The second component, MuleSoft Agent Fabric, addresses a problem that will become more acute as companies build more AI systems: agent sprawl. The tool provides what Lerhaupt describes as "a single pane of glass across every agent," including those built outside the Salesforce ecosystem. Agent Fabric's Agent Visualizer creates a visual map of a company's entire agent network, giving visibility across all agent interactions from a single dashboard.The observability tools break down into three functional areas. Agent Analytics tracks performance metrics, surfaces KPI trends over time, and highlights ineffective topics or actions. Agent Optimization provides end-to-end visibility of every interaction, groups similar requests to uncover patterns, and identifies configuration issues. Agent Health Monitoring, which will become generally available in Spring 2026, tracks key health metrics in near real-time and sends alerts on critical errors and latency spikes.Pierre Matuchet, senior vice president of IT and digital transformation at Adecco, said the visibility helped his team build confidence even before full deployment. "Even during early notebook testing, we saw the agent handle unexpected scenarios, like when candidates didn't want to answer questions already covered in their CVs, appropriately and as designed," Matuchet said. "Agentforce Observability helped us identify unanticipated user behavior and gave us confidence, even before the agent went live, that it could act responsibly and reliably."Why Salesforce says its AI observability tools beat Microsoft, Google, and AWS monitoringThe announcement puts Salesforce in direct competition with Microsoft, Google, and Amazon Web Services, all of which offer monitoring capabilities built into their AI agent platforms. Lerhaupt argued that enterprises need more than the basic monitoring those providers offer."Observability comes out-of-the-box standard with Agentforce at no extra cost," Lerhaupt said, positioning the offering as comprehensive rather than supplementary. He emphasized that the tools provide "deeper insight than ever before" by capturing "the full telemetry and reasoning behind every agentic interaction" through the Session Tracing Data Model, then using that data to "provide key analysis and session quality scoring to help customers optimize and improve their agents."The competitive positioning matters because enterprises face a choice: build their AI infrastructure on a cloud provider's platform and use its native monitoring tools, or adopt a specialized observability layer like Salesforce's. Lerhaupt framed the decision as one of depth versus breadth. "Enterprises need more than basic monitoring to measure the success of their AI deployments," he said. "They need full visibility into every agent interaction and decision."The 1.2 billion workflow question: Are AI agent deployments moving from pilot projects to production?The broader question is whether Salesforce is solving a problem most enterprises will face imminently or building for a future that remains years away. The company's 282% surge in AI implementation sounds dramatic, but that figure doesn't distinguish between production deployments and pilot projects.When asked about this directly, Lerhaupt pointed to customer examples rather than offering a breakdown. He described a three-phase journey from experimentation to scale. "On Day 0, trust is the foundation," he said, citing 1-800Accountant's 70% autonomous resolution of chat engagements. "Day 1 is where designing ideas to become real, usable AI," with Williams Sonoma delivering more than 150,000 AI experiences monthly. "On Day 2, once trust and design are built, it becomes about scaling early wins into enterprise-wide outcomes," pointing to Falabella's 600,000 AI workflows per month that have grown fourfold in three months.Lerhaupt said Salesforce has 12,000-plus customers across 39 countries running Agentforce, powering 1.2 billion agentic workflows. Those numbers suggest the shift from pilot to production is already underway at scale, though the company didn't provide a breakdown of how many customers are running production workloads versus experimental deployments.The economics of AI deployment may accelerate adoption regardless of readiness. Companies face mounting pressure to reduce headcount costs while maintaining or improving service levels. AI agents promise to resolve that tension, but only if businesses can trust them to work reliably. Observability tools like Salesforce's represent the trust layer that makes scaled deployment possible.What happens after AI agent deployment: Why continuous monitoring matters more than initial testingThe deeper story is about a shift in how enterprises think about AI deployment. The official announcement framed this clearly: "The agent development lifecycle begins with three foundational steps: build, test, and deploy. While many organizations have already moved past the initial hurdle of creating their first agents, the real enterprise challenge starts immediately after deployment."That framing reflects a maturing understanding of AI in production environments. Early AI deployments often treated the technology as a one-time implementation — build it, test it, ship it. But AI agents behave differently than traditional software. They learn, adapt, and make decisions based on probabilistic models rather than deterministic code. That means their behavior can drift over time, or they can develop unexpected failure modes that only emerge under real-world conditions."Building an agent is just the beginning," Lerhaupt said. "Once the trust is built for agents to begin handling real work, companies may start by seeing the results, but may not understand the 'why' behind them or see areas to optimize. Customers interact with products—including agents—in unexpected ways and to optimize the customer experience, transparency around agent behavior and outcomes is critical."Teeples made the same point more bluntly when asked what would be different without observability tools. "This level of visibility has given full trust in continuing to expand our agent deployment," he said. The implication is clear: without visibility, deployment would slow or stop. 1-800Accountant plans to expand Slack integrations for internal workflows, deploy Service Cloud Voice for case deflection, and leverage Tableau for conversational analytics—all dependent on the confidence that observability provides.How enterprise AI trust issues became the biggest barrier to scaling autonomous agentsThe recurring theme in customer interviews is trust, or rather, the lack of it. AI agents work, sometimes spectacularly well, but executives don't trust them enough to deploy them widely. Observability tools aim to convert black-box systems into transparent ones, replacing faith with evidence.This matters because trust is the bottleneck constraining AI adoption, not technological capability. The models are powerful enough, the infrastructure is mature enough, and the business case is compelling enough. What's missing is executive confidence that AI agents will behave predictably and that problems can be diagnosed and fixed quickly when they arise.Salesforce is betting that observability tools can remove that bottleneck. The company positions Agentforce Observability not as a monitoring tool but as a management layer—"just like managers work with their human employees to ensure they are working towards the right objectives and optimizing performance," Lerhaupt said.The analogy is telling. If AI agents are becoming digital employees, they need the same kind of ongoing supervision, feedback, and optimization that human employees receive. The difference is that AI agents can be monitored with far more granularity than any human worker. Every decision, every reasoning step, every data point consulted can be logged, analyzed, and scored.That creates both opportunity and obligation. The opportunity is continuous improvement at a pace impossible with human workers. The obligation is to actually use that data to optimize agent performance, not just collect it. Whether enterprises can build the organizational processes to turn observability data into systematic improvement remains an open question.But one thing has become increasingly clear in the race to deploy AI at scale: Companies that can see what their agents are doing will move faster than those flying blind. In the emerging era of autonomous AI, observability isn't just a nice-to-have feature. It's the difference between cautious experimentation and confident deployment—between treating AI as a risky bet and managing it as a trusted workforce. The question is no longer whether AI agents can work. It's whether businesses can see well enough to let them.
Many practitioners like to jump headfirst into the nitty-gritty details of implementing AI-powered tools. We get it: tinkering your way into a solution can sometimes save you time, and it’s often a fun way to go about learning. As the articles we’re highlighting this week show, however, it’s crucial to gain a high-level understanding of how […]
The post TDS Newsletter: How to Build Robust Data and AI Systems appeared first on Towards Data Science.
Google’s latest AI image model is vastly better than the previous release at generating text in images. You can expect companies to go buck wild with this update.
Researchers at Google have developed a new AI paradigm aimed at solving one of the biggest limitations in today’s large language models: their inability to learn or update their knowledge after training. The paradigm, called Nested Learning, reframes a model and its training not as a single process, but as a system of nested, multi-level optimization problems. The researchers argue that this approach can unlock more expressive learning algorithms, leading to better in-context learning and memory.To prove their concept, the researchers used Nested Learning to develop a new model, called Hope. Initial experiments show that it has superior performance on language modeling, continual learning, and long-context reasoning tasks, potentially paving the way for efficient AI systems that can adapt to real-world environments.The memory problem of large language modelsDeep learning algorithms helped obviate the need for the careful engineering and domain expertise required by traditional machine learning. By feeding models vast amounts of data, they could learn the necessary representations on their own. However, this approach presented its own set of challenges that couldn’t be solved by simply stacking more layers or creating larger networks, such as generalizing to new data, continually learning new tasks, and avoiding suboptimal solutions during training.Efforts to overcome these challenges led to the innovations that led to Transformers, the foundation of today's large language models (LLMs). These models have ushered in "a paradigm shift from task-specific models to more general-purpose systems with various emergent capabilities as a result of scaling the 'right' architectures," the researchers write. Still, a fundamental limitation remains: LLMs are largely static after training and can't update their core knowledge or acquire new skills from new interactions.The only adaptable component of an LLM is its in-context learning ability, which allows it to perform tasks based on information provided in its immediate prompt. This makes current LLMs analogous to a person who can't form new long-term memories. Their knowledge is limited to what they learned during pre-training (the distant past) and what's in their current context window (the immediate present). Once a conversation exceeds the context window, that information is lost forever.The problem is that today’s transformer-based LLMs have no mechanism for “online” consolidation. Information in the context window never updates the model’s long-term parameters — the weights stored in its feed-forward layers. As a result, the model can’t permanently acquire new knowledge or skills from interactions; anything it learns disappears as soon as the context window rolls over.A nested approach to learningNested Learning (NL) is designed to allow computational models to learn from data using different levels of abstraction and time-scales, much like the brain. It treats a single machine learning model not as one continuous process, but as a system of interconnected learning problems that are optimized simultaneously at different speeds. This is a departure from the classic view, which treats a model's architecture and its optimization algorithm as two separate components.Under this paradigm, the training process is viewed as developing an "associative memory," the ability to connect and recall related pieces of information. The model learns to map a data point to its local error, which measures how "surprising" that data point was. Even key architectural components like the attention mechanism in transformers can be seen as simple associative memory modules that learn mappings between tokens. By defining an update frequency for each component, these nested optimization problems can be ordered into different "levels," forming the core of the NL paradigm.Hope for continual learningThe researchers put these principles into practice with Hope, an architecture designed to embody Nested Learning. Hope is a modified version of Titans, another architecture Google introduced in January to address the transformer model's memory limitations. While Titans had a powerful memory system, its parameters were updated at only two different speeds: a long-term memory module and a short-term memory mechanism.Hope is a self-modifying architecture augmented with a "Continuum Memory System" (CMS) that enables unbounded levels of in-context learning and scales to larger context windows. The CMS acts like a series of memory banks, each updating at a different frequency. Faster-updating banks handle immediate information, while slower ones consolidate more abstract knowledge over longer periods. This allows the model to optimize its own memory in a self-referential loop, creating an architecture with theoretically infinite learning levels.On a diverse set of language modeling and common-sense reasoning tasks, Hope demonstrated lower perplexity (a measure of how well a model predicts the next word in a sequence and maintains coherence in the text it generates) and higher accuracy compared to both standard transformers and other modern recurrent models. Hope also performed better on long-context "Needle-In-Haystack" tasks, where a model must find and use a specific piece of information hidden within a large volume of text. This suggests its CMS offers a more efficient way to handle long information sequences.This is one of several efforts to create AI systems that process information at different levels. Hierarchical Reasoning Model (HRM) by Sapient Intelligence, used a hierarchical architecture to make the model more efficient in learning reasoning tasks. Tiny Reasoning Model (TRM), a model by Samsung, improves HRM by making architectural changes, improving its performance while making it more efficient.While promising, Nested Learning faces some of the same challenges of these other paradigms in realizing its full potential. Current AI hardware and software stacks are heavily optimized for classic deep learning architectures and Transformer models in particular. Adopting Nested Learning at scale may require fundamental changes. However, if it gains traction, it could lead to far more efficient LLMs that can continually learn, a capability crucial for real-world enterprise applications where environments, data, and user needs are in constant flux.
Elon Musk's frontier generative AI startup xAI formally opened developer access to its Grok 4.1 Fast models last night and introduced a new Agent Tools API—but the technical milestones were immediately subverted by a wave of public ridicule about Grok's responses on the social network X over the last few days praising its creator Musk as more athletic than championship-winning American football players and legendary boxer Mike Tyson, despite having displayed no public prowess at either sport.They emerge as yet another black eye for xAI's Grok following the "MechaHitler" scandal in the summer of 2025, in which an earlier version of Grok adopted a verbally antisemitic persona inspired by the late German dictator and Holocaust architect, and an incident in May 2025 which it replied to X users to discuss unfounded claims of "white genocide" in Musk's home country of South Africa to unrelated subject matter.This time, X users shared dozens of examples of Grok alleging Musk was stronger or more performant than elite athletes and a greater thinker than luminaries such as Albert Einstein, sparking questions about the AI's reliability, bias controls, adversarial prompting defenses, and the credibility of xAI’s public claims about “maximally truth-seeking” models. .Against this backdrop, xAI’s actual developer-focused announcement—the first-ever API availability for Grok 4.1 Fast Reasoning, Grok 4.1 Fast Non-Reasoning, and the Agent Tools API—landed in a climate dominated by memes, skepticism, and renewed scrutiny.How the Grok Musk Glazing Controversy Overshadowed the API ReleaseAlthough Grok 4.1 was announced on the evening of Monday, November 17, 2025 as available to consumers via the X and Grok apps and websites, the API launch announced last night, on November 19, was intended to mark a developer-focused expansion. Instead, the conversation across X shifted sharply toward Grok’s behavior in consumer channels.Between November 17–20, users discovered that Grok would frequently deliver exaggerated, implausible praise for Musk when prompted—sometimes subtly, often brazenly. Responses declaring Musk “more fit than LeBron James,” a superior quarterback to Peyton Manning, or “smarter than Albert Einstein” gained massive engagement. When paired with identical prompts substituting “Bill Gates” or other figures, Grok often responded far more critically, suggesting inconsistent preference handling or latent alignment drift.Screenshots spread by high-engagement accounts (e.g., @SilvermanJacob, @StatisticUrban) framed Grok as unreliable or compromised.Memetic commentary—“Elon’s only friend is Grok”—became shorthand for perceived sycophancy.Media coverage, including a November 20 report from The Verge, characterized Grok’s responses as “weird worship,” highlighting claims that Musk is “as smart as da Vinci” and “fitter than LeBron James.”Critical threads argued that Grok’s design choices replicated past alignment failures, such as a July 2025 incident where Grok generated problematic praise of Adolf Hitler under certain prompting conditions.The viral nature of the glazing overshadowed the technical release and complicated xAI’s messaging about accuracy and trustworthiness.Implications for Developer Adoption and TrustThe juxtaposition of a major API release with a public credibility crisis raises several concerns:Alignment Controls
The glazing behavior suggests that prompt adversariality may expose latent preference biases, undermining claims of “truth-maximization.”Brand Contamination Across Deployment Contexts
Though the consumer chatbot and API-accessible model share lineage, developers may conflate the reliability of both—even if safeguards differ.Risk in Agentic Systems
The Agent Tools API gives Grok abilities such as web search, code execution, and document retrieval. Bias-driven misjudgments in those contexts could have material consequences.Regulatory Scrutiny
Biased outputs that systematically favor a CEO or public figure could attract attention from consumer protection regulators evaluating AI representational neutrality.Developer Hesitancy
Early adopters may wait for evidence that the model version exposed through the API is not subject to the same glazing behaviors seen in consumer channels.Musk himself attempted to defuse the situation with a self-deprecating X post this evening, writing:“Grok was unfortunately manipulated by adversarial prompting into saying absurdly positive things about me. For the record, I am a fat retard.”While intended to signal transparency, the admission did not directly address whether the root cause was adversarial prompting alone or whether model training introduced unintentional positive priors. Nor did it clarify whether the API-exposed versions of Grok 4.1 Fast differ meaningfully from the consumer version that produced the offending outputs.Until xAI provides deeper technical detail about prompt vulnerabilities, preference modeling, and safety guardrails, the controversy is likely to persist.Two Grok 4.1 Models Available on xAI APIAlthough consumers using Grok apps gained access to Grok 4.1 Fast earlier in the week, developers could not previously use the model through the xAI API. The latest release closes that gap by adding two new models to the public model catalog:grok-4-1-fast-reasoning — designed for maximal reasoning performance and complex tool workflowsgrok-4-1-fast-non-reasoning — optimized for extremely fast responsesBoth models support a 2 million–token context window, aligning them with xAI’s long-context roadmap and providing substantial headroom for multistep agent tasks, document processing, and research workflows.The new additions appear alongside updated entries in xAI’s pricing and rate-limit tables, confirming that they now function as first-class API endpoints across xAI infrastructure and routing partners such as OpenRouter.Agent Tools API: A New Server-Side Tool LayerThe other major component of the announcement is the Agent Tools API, which introduces a unified mechanism for Grok to call tools across a range of capabilities:Search Tools including a direct link to X (Twitter) search for real-time conversations and web search for broad external retrieval.Files Search: Retrieval and citation of relevant documents uploaded by usersCode Execution: A secure Python sandbox for analysis, simulation, and data processingMCP (Model Context Protocol) Integration: Connects Grok agents with third-party tools or custom enterprise systemsxAI emphasizes that the API handles all infrastructure complexity—including sandboxing, key management, rate limiting, and environment orchestration—on the server side. Developers simply declare which tools are available, and Grok autonomously decides when and how to invoke them. The company highlights that the model frequently performs multi-tool, multi-turn workflows in parallel, reducing latency for complex tasks.How the New API Layer Leverages Grok 4.1 FastWhile the model existed before today’s API release, Grok 4.1 Fast was trained explicitly for tool-calling performance. The model’s long-horizon reinforcement learning tuning supports autonomous planning, which is essential for agent systems that chain multiple operations.Key behaviors highlighted by xAI include:Consistent output quality across the full 2M token context window, enabled by long-horizon RLReduced hallucination rate, cut in half compared with Grok 4 Fast while maintaining Grok 4’s factual accuracy performanceParallel tool use, where Grok executes multiple tool calls concurrently when solving multi-step problemsAdaptive reasoning, allowing the model to plan tool sequences over several turnsThis behavior aligns directly with the Agent Tools API’s purpose: to give Grok the external capabilities necessary for autonomous agent work.Benchmark Results Demonstrating Highest Agentic PerformancexAI released a set of benchmark results intended to illustrate how Grok 4.1 Fast performs when paired with the Agent Tools API, emphasizing scenarios that rely on tool calling, long-context reasoning, and multi-step task execution. On τ²-bench Telecom, a benchmark built to replicate real-world customer-support workflows involving tool use, Grok 4.1 Fast achieved the highest score among all listed models — outpacing even Google's new Gemini 3 Pro and OpenAI's recent 5.1 on high reasoning — while also achieving among the lowest prices for developers and users. The evaluation, independently verified by Artificial Analysis, cost $105 to complete and served as one of xAI’s central claims of superiority in agentic performance.In structured function-calling tests, Grok 4.1 Fast Reasoning recorded a 72 percent overall accuracy on the Berkeley Function Calling v4 benchmark, a result accompanied by a reported cost of $400 for the run. xAI noted that Gemini 3 Pro’s comparative result in this benchmark stemmed from independent estimates rather than an official submission, leaving some uncertainty in cross-model comparisons.Long-horizon evaluations further underscored the model’s design emphasis on stability across large contexts. In multi-turn tests involving extended dialog and expanded context windows, Grok 4.1 Fast outperformed both Grok 4 Fast and the earlier Grok 4, aligning with xAI’s claims that long-horizon reinforcement learning helped mitigate the typical degradation seen in models operating at the two-million-token scale.A second cluster of benchmarks—Research-Eval, FRAMES, and X Browse—highlighted Grok 4.1 Fast’s capabilities in tool-augmented research tasks. Across all three evaluations, Grok 4.1 Fast paired with the Agent Tools API earned the highest scores among the models with published results. It also delivered the lowest average cost per query in Research-Eval and FRAMES, reinforcing xAI’s messaging on cost-efficient research performance. In X Browse, an internal xAI benchmark assessing multihop search capabilities across the X platform, Grok 4.1 Fast again led its peers, though Gemini 3 Pro lacked cost data for direct comparison.Developer Pricing and Temporary Free AccessAPI pricing for Grok 4.1 Fast is as follows:Input tokens: $0.20 per 1MCached input tokens: $0.05 per 1MOutput tokens: $0.50 per 1MTool calls: From $5 per 1,000 successful tool invocationsTo facilitate early experimentation:Grok 4.1 Fast is free on OpenRouter until December 3rd.The Agent Tools API is also free through December 3rd via the xAI API.When paying for the models outside of the free period, Grok 4.1 Fast reasoning and non-reasoning are both among the cheaper options from major frontier labs through their own APIs. See below:ModelInput (/1M)Output (/1M)Total CostSourceQwen 3 Turbo$0.05$0.20$0.25Alibaba CloudERNIE 4.5 Turbo$0.11$0.45$0.56QianfanGrok 4.1 Fast (reasoning)$0.20$0.50$0.70xAIGrok 4.1 Fast (non-reasoning)$0.20$0.50$0.70xAIdeepseek-chat (V3.2-Exp)$0.28$0.42$0.70DeepSeekdeepseek-reasoner (V3.2-Exp)$0.28$0.42$0.70DeepSeekQwen 3 Plus$0.40$1.20$1.60Alibaba CloudERNIE 5.0$0.85$3.40$4.25QianfanQwen-Max$1.60$6.40$8.00Alibaba CloudGPT-5.1$1.25$10.00$11.25OpenAIGemini 2.5 Pro (≤200K)$1.25$10.00$11.25GoogleGemini 3 Pro (≤200K)$2.00$12.00$14.00GoogleGemini 2.5 Pro (>200K)$2.50$15.00$17.50GoogleGrok 4 (0709)$3.00$15.00$18.00xAIGemini 3 Pro (>200K)$4.00$18.00$22.00GoogleClaude Opus 4.1$15.00$75.00$90.00AnthropicHow Enterprises Should Evaluate Grok 4.1 Fast in Light of Performance, Cost, and TrustFor enterprises evaluating frontier-model deployments, Grok 4.1 Fast presents a compelling combination of high performance and low operational cost. Across multiple agentic and function-calling benchmarks, the model consistently outperforms or matches leading systems like Gemini 3 Pro, GPT-5.1 (high), and Claude 4.5 Sonnet, while operating inside a far more economical cost envelope. At $0.70 per million tokens, both Grok 4.1 Fast variants sit only marginally above ultracheap models like Qwen 3 Turbo but deliver accuracy levels in line with systems that cost 10–20× more per unit. The τ²-bench Telecom results reinforce this value proposition: Grok 4.1 Fast not only achieved the highest score in its test cohort but also appears to be the lowest-cost model in that benchmark run. In practical terms, this gives enterprises an unusually favorable cost-to-intelligence ratio, particularly for workloads involving multistep planning, tool use, and long-context reasoning.However, performance and pricing are only part of the equation for organizations considering large-scale adoption. The recent “glazing” controversy from Grok’s consumer deployment on X — combined with the earlier "MechaHitler" and "White Genocid" incidents — expose credibility and trust-surface risks that enterprises cannot ignore. Even if the API models are technically distinct from the consumer-facing variant, the inability to prevent sycophantic, adversarially-induced bias in a high-visibility environment raises legitimate concerns about downstream reliability in operational contexts. Enterprise procurement teams will rightly ask whether similar vulnerabilities—preference skew, alignment drift, or context-sensitive bias—could surface when Grok is connected to production databases, workflow engines, code-execution tools, or research pipelines.The introduction of the Agent Tools API raises the stakes further. Grok 4.1 Fast is not just a text generator—it is now an orchestrator of web searches, X-data queries, document retrieval operations, and remote Python execution. These agentic capabilities amplify productivity but also expand the blast radius of any misalignment. A model that can over-index on flattering a public figure could, in principle, also misprioritize results, mis-handle safety boundaries, or deliver skewed interpretations when operating with real-world data. Enterprises therefore need a clear understanding of how xAI isolates, audits, and hardens its API models relative to the consumer-facing Grok whose failures drove the latest scrutiny.The result is a mixed strategic picture. On performance and price, Grok 4.1 Fast is highly competitive—arguably one of the strongest value propositions in the modern LLM market. But xAI’s enterprise appeal will ultimately depend on whether the company can convincingly demonstrate that the alignment instability, susceptibility to adversarial prompting, and bias-amplifying behavior observed on X do not translate into its developer-facing platform. Without transparent safeguards, auditability, and reproducible evaluation across the very tools that enable autonomous operation, organizations may hesitate to commit core workloads to a system whose reliability is still the subject of public doubt. For now, Grok 4.1 Fast is a technically impressive and economically efficient option—one that enterprises should test, benchmark, and validate rigorously before allowing it to take on mission-critical tas
5K AI patents, Nano Banana Pro, Minecraft AI edu, Amazon vid recaps, and more...
A federal prosecutor alleged that one defendant boasted that his father “had engaged in similar business for the Chinese Communist Party.”
Learning science consistently shows us that true learning requires active engagement. This is fundamental to how Gemini helps you learn. Going beyond simple text and sta…
Infographics rendered without a single spelling error. Complex diagrams one-shotted from paragraph prompts. Logos restored from fragments. And visual outputs so sharp with so much text density and accuracy, one developer simply called it “absolutely bonkers.”Google DeepMind’s newly released Nano Banana Pro—officially Gemini 3 Pro Image—has drawn astonishment from both the developer community and enterprise AI engineers. But behind the viral praise lies something more transformative: a model built not just to impress, but to integrate deeply across Google’s AI stack—from Gemini API and Vertex AI to Workspace apps, Ads, and Google AI Studio.Unlike earlier image models, which targeted casual users or artistic use cases, Gemini 3 Pro Image introduces studio-quality, multimodal image generation for structured workflows—with high resolution, multilingual accuracy, layout consistency, and real-time knowledge grounding. It’s engineered for technical buyers, orchestration teams, and enterprise-scale automation, not just creative exploration.Benchmarks already show the model outperforming peers in overall visual quality, infographic generation, and text rendering accuracy. And as real-world users push it to its limits—from medical illustrations to AI memes—the model is revealing itself as both a new creative tool and a visual reasoning system for the enterprise stack.Built for Structured Multimodal ReasoningGemini 3 Pro Image isn’t just drawing pretty pictures—it’s leveraging the reasoning layer of Gemini 3 Pro to generate visuals that communicate structure, intent, and factual grounding. The model is capable of generating UX flows, educational diagrams, storyboards, and mockups from language prompts, and can incorporate up to 14 source images with consistent identity and layout fidelity across subjects.Google describes the model as “a higher-fidelity model built on Gemini 3 Pro for developers to access studio-quality image generation,” and confirms it is now available via Gemini API, Google AI Studio, and Vertex AI for enterprise access.In Antigravity, Google’s new AI vibe coding platform built by the former Windsurf co-founders it hired earlier this year, Gemini 3 Pro Image is already being used to create dynamic UI prototypes with image assets rendered before code is written. The same capabilities are rolling out to Google’s enterprise-facing products like Workspace Vids, Slides, and Google Ads, giving teams precise control over asset layout, lighting, typography, and image composition.High-Resolution Output, Localization, and Real-Time GroundingThe model supports output resolutions of up to 2K and 4K, and includes studio-level controls over camera angle, color grading, focus, and lighting. It handles multilingual prompts, semantic localization, and in-image text translation, enabling workflows like:Translating packaging or signage while preserving layoutUpdating UX mockups for regional marketsGenerating consistent ad variants with product names and pricing changed by localeOne of the clearest use cases is infographics—both technical and commercial. Dr. Derya Unutmaz, an immunologist, generated a full medical illustration describing the stages of CAR-T cell therapy from lab to patient, praising the result as “perfect.” AI educator Dan Mac created a visual guide explaining transformer models “for a non-technical person” and called the result “unbelievable.”Even complex structured visuals like full restaurant menus, chalkboard lecture visuals, or multi-character comic strips have been shared online—generated in a single prompt, with coherent typography, layout, and subject continuity.Benchmarks Signal a Lead in Compositional Image GenerationIndependent GenAI-Bench results show Gemini 3 Pro Image as a state-of-the-art performer across key categories:It ranks highest in overall user preference, suggesting strong visual coherence and prompt alignment.It leads in visual quality, ahead of competitors like GPT-Image 1 and Seedream v4.Most notably, it dominates in infographic generation, outscoring even Google’s own previous model, Gemini 2.5 Flash.Additional benchmarks released by Google show Gemini 3 Pro Image with lower text error rates across multiple languages, as well as stronger performance in image editing fidelity.The difference becomes especially apparent in structured reasoning tasks. Where previous models might approximate style or fill in layout gaps, Gemini 3 Pro Image demonstrates consistency across panels, accurate spatial relationships, and context-aware detail preservation—crucial for systems generating diagrams, documentation, or training visuals at scale.Pricing Is Competitive for the QualityFor developers and enterprise teams accessing Gemini 3 Pro Image via the Gemini API or Google AI Studio, pricing is tiered by resolution and usage. Input tokens for images are priced at $0.0011 per image (equivalent to 560 tokens or $0.067 per image), while output pricing depends on resolution: standard 1K and 2K images cost approximately $0.134 each (1,120 tokens), and high-resolution 4K images cost $0.24 (2,000 tokens). Text input and output are priced in line with Gemini 3 Pro: $2.00 per million input tokens and $12.00 per million output tokens when using the model’s reasoning capabilities. The free tier currently does not include access to Nano Banana Pro, and unlike free-tier models, the paid-tier generations are not used to train Google’s systems.Here’s a comparison table of major image-generation APIs for developers/enterprises, followed by a discussion of how they stack up (including the tiered pricing for Gemini 3 Pro Image / “Nano Banana Pro”).Model / ServiceApproximate Price per Image or Token-UnitKey Notes / Resolution TiersGoogle – Gemini 3 Pro Image (Nano Banana Pro)Input (image): ~$0.067 per image (560 tokens). Output: ~$0.134 per image for 1K/2K (1120 tokens), ~$0.24 per image for 4K (2000 tokens). Text: $2.00 per million input tokens & $12.00 per million output tokens (≤200k token context) Tiered by resolution; paid-tier images are not used to train Google’s systems.OpenAI – DALL-E 3 API~ $0.04/image for 1024×1024 standard; ~$0.08/image for larger/resolution/HD. Lower cost per image; resolution and quality tiers adjust pricing.OpenAI – GPT-Image-1 (via Azure/OpenAI)Low tier ~$0.01/image; Medium ~$0.04/image; High ~$0.17/image. Token-based pricing – more complex prompts or higher resolution raise cost.Google – Gemini 2.5 Flash Image (Nano Banana)~$0.039 per image for 1024×1024 resolution (1290 tokens) in output. Lower cost “flash” model for high-volume, lower latency use.Other / Smaller APIs (e.g., via third-party credit systems)Examples: $0.02–$0.03 per image in some cases for lower resolution or simpler models. Often used for less demanding production use cases or draft content.The Google Gemini 3 Pro Image / Nano Banana Pro pricing sits at the upper end: ~$0.134 for 1K/2K, ~$0.24 for 4K, significantly higher than the ~$0.04 per image baseline for many OpenAI/DALL-E 3 standard images. But the higher cost might be justifiable if: you require 4K resolution; you need enterprise-grade governance (e.g., Google emphasizes that paid-tier images are not used to train their systems); you need a token-based pricing system aligned with other LLM usage; and you already operate within Google’s cloud/AI stack (e.g., using Vertex AI).On the other hand, if you’re generating large volumes of images (thousands to tens of thousands) and can accept lower resolution (1K/2K) or slightly less premium quality, the lower-cost alternatives (OpenAI, smaller models) offer meaningful savings — for instance, generating 10,000 images at ~$0.04 each costs ~$400, whereas at ~$0.134 each it’s ~$1,340. Over time, that delta adds up.SynthID and the Growing Need for Enterprise ProvenanceEvery image generated by Gemini 3 Pro Image includes SynthID, Google’s imperceptible digital watermarking system. While many platforms are just beginning to explore AI provenance, Google is positioning SynthID as a core part of its enterprise compliance stack.In the updated Gemini app, users can now upload an image and ask whether it was AI-generated by Google—a feature designed to support growing regulatory and internal governance demands.A Google blog post emphasizes that provenance is no longer a “feature” but an operational requirement, particularly in high-stakes domains like healthcare, education, and media. SynthID also allows teams building on Google Cloud to differentiate between AI-generated content and third-party media across assets, use logs, and audit trails.Early Developer Reactions Range from Awe to Edge-Case TestingDespite the enterprise framing, early developer reactions have turned social media into a real-time proving ground.Designer Travis Davids called out a one-shot restaurant menu with flawless layout and typography: “Long generated text is officially solved.” Immunologist Dr. Derya Unutmaz posted his CAR-T diagram with the caption: “What have you done, Google?!” while Nikunj Kothari converted a full essay into a stylized blackboard lecture in one shot, calling the results “simply speechless.”Engineer Deedy Das praised its performance across editing and brand restoration tasks: “Photoshop-like editing… It nails everything...By far the best image model I've ever seen.” Developer Parker Ortolani summarized it more simply: “Nano Banana remains absolutely bonkers.”Even meme creators got involved. @cto_junior generated a fully styled “LLM discourse desk” meme—with logos, charts, monitors, and all—in one prompt, dubbing Gemini 3 Pro Image “your new meme engine.”But scrutiny followed, too. AI researcher Lisan al Gaib tested the model on a logic-heavy Sudoku problem, showing it hallucinated both an invalid puzzle and a nonsensical solution, noting that the model “is sadly not AGI.” The post served as a reminder that visual reasoning has limits, particularly in rule-constrained systems where hallucinated logic remains a persistent failure mode.A New Platform Primitive, Not Just a ModelGemini 3 Pro Image now lives across Google’s entire enterprise and developer stack: Google Ads, Workspace (Slides, Vids), Vertex AI, Gemini API, and Google AI Studio. It’s also deployed in internal tools like Antigravity, where design agents render layout drafts before interface elements are coded.This makes it a first-class multimodal primitive inside Google’s AI ecosystem, much like text completion or speech recognition. In enterprise applications, visuals are not decorations—they’re data, documentation, design, and communication. Whether generating onboarding explainers, prototype visuals, or localized collateral, models like Gemini 3 Pro Image allow systems to create assets programmatically, with control, scale, and consistency.At a time when the race between OpenAI, Google, and xAI is moving beyond benchmarks and into platforms, Nano Banana Pro is Google’s quiet declaration: the future of generative AI won’t just be spoken or written—it will be seen.
This blog post has explores how MSD is harnessing the power of generative AI and databases to optimize and transform its manufacturing deviation management process. By creating an accurate and multifaceted knowledge base of past events, deviations, and findings, the company aims to significantly reduce the time and effort required for each new case while maintaining the highest standards of quality and compliance.
In this blog post, we show you how agentic workflows can accelerate the processing and interpretation of genomics pipelines at scale with a natural language interface. We demonstrate a comprehensive genomic variant interpreter agent that combines automated data processing with intelligent analysis to address the entire workflow from raw VCF file ingestion to conversational query interfaces.