Latest AI News & Updates

#research #computer science and technology #artificial intelligence #machine learning #human-computer interaction #data #technology and society #laboratory for information and decision systems (lids) #institute for medical engineering and science (imes) #electrical engineering and computer science (eecs) #school of engineering #mit schwarzman college of computing #national science foundation (nsf)

Large language models can learn to mistakenly link certain sentence patterns with specific topics — and may then repeat these patterns instead of reasoning.

#ai

Researchers at Alibaba’s Tongyi Lab have developed a new framework for self-evolving agents that create their own training data by exploring their application environments. The framework, AgentEvolver, uses the knowledge and reasoning capabilities of large language models for autonomous learning, addressing the high costs and manual effort typically required to gather task-specific datasets.Experiments show that compared to traditional reinforcement learning–based frameworks, AgentEvolver is more efficient at exploring its environment, makes better use of data, and adapts faster to application environments. For the enterprise, this is significant because it lowers the barrier to training agents for bespoke applications, making powerful, custom AI assistants more accessible to a wider range of organizations.The high cost of training AI agentsReinforcement learning has become a major paradigm for training LLMs to act as agents that can interact with digital environments and learn from feedback. However, developing agents with RL faces fundamental challenges. First, gathering the necessary training datasets is often prohibitively expensive, requiring significant manual labor to create examples of tasks, especially in novel or proprietary software environments where there are no available off-the-shelf datasets.Second, the RL techniques commonly used for LLMs require the model to run through a massive number of trial-and-error attempts to learn effectively. This process is computationally costly and inefficient. As a result, training capable LLM agents through RL remains laborious and expensive, limiting their deployment in custom enterprise settings.How AgentEvolver worksThe main idea behind AgentEvolver is to give models greater autonomy in their own learning process. The researchers describe it as a “self-evolving agent system” designed to “achieve autonomous and efficient capability evolution through environmental interaction.” It uses the reasoning power of an LLM to create a self-training loop, allowing the agent to continuously improve by directly interacting with its target environment without needing predefined tasks or reward functions.“We envision an agent system where the LLM actively guides exploration, task generation, and performance refinement,” the researchers wrote in their paper.The self-evolution process is driven by three core mechanisms that work together.The first is self-questioning, where the agent explores its environment to discover the boundaries of its functions and identify useful states. It’s like a new user clicking around an application to see what’s possible. Based on this exploration, the agent generates its own diverse set of tasks that align with a user’s general preferences. This reduces the need for handcrafted datasets and allows the agent and its tasks to co-evolve, progressively enabling it to handle more complex challenges. According to Yunpeng Zhai, researcher at Alibaba and co-author of the paper, who spoke to VentureBeat, the self-questioning mechanism effectively turns the model from a “data consumer into a data producer,” dramatically reducing the time and cost required to deploy an agent in a proprietary environment.The second mechanism is self-navigating, which improves exploration efficiency by reusing and generalizing from past experiences. AgentEvolver extracts insights from both successful and unsuccessful attempts and uses them to guide future actions. For example, if an agent tries to use an API function that doesn't exist in an application, it registers this as an experience and learns to verify the existence of functions before attempting to use them in the future.The third mechanism, self-attributing, enhances learning efficiency by providing more detailed feedback. Instead of just a final success or failure signal (a common practice in RL that can result in sparse rewards), this mechanism uses an LLM to assess the contribution of each individual action in a multi-step task. It retrospectively determines whether each step contributed positively or negatively to the final outcome, giving the agent fine-grained feedback that accelerates learning. This is crucial for regulated industries where how an agent solves a problem is as important as the result. “Instead of rewarding a student only for the final answer, we also evaluate the clarity and correctness of each step in their reasoning,” Zhai explained. This improves transparency and encourages the agent to adopt more robust and auditable problem-solving patterns.“By shifting the training initiative from human-engineered pipelines to LLM-guided self-improvement, AgentEvolver establishes a new paradigm that paves the way toward scalable, cost-effective, and continually improving intelligent systems,” the researchers state.The team has also developed a practical, end-to-end training framework that integrates these three mechanisms. A key part of this foundation is the Context Manager, a component that controls the agent's memory and interaction history. While today's benchmarks test a limited number of tools, real enterprise environments can involve thousands of APIs. Zhai acknowledges this is a core challenge for the field, but notes that AgentEvolver was designed to be extended. “Retrieval over extremely large action spaces will always introduce computational challenges, but AgentEvolver’s architecture provides a clear path toward scalable tool reasoning in enterprise settings,” he said.A more efficient path to agent trainingTo measure the effectiveness of their framework, the researchers tested it on AppWorld and BFCL v3, two benchmarks that require agents to perform long, multi-step tasks using external tools. They used models from Alibaba’s Qwen2.5 family (7B and 14B parameters) and compared their performance against a baseline model trained with GRPO, a popular RL technique used to develop reasoning models like DeepSeek-R1.The results showed that integrating all three mechanisms in AgentEvolver led to substantial performance gains. For the 7B model, the average score improved by 29.4%, and for the 14B model, it increased by 27.8% over the baseline. The framework consistently enhanced the models' reasoning and task-execution capabilities across both benchmarks. The most significant improvement came from the self-questioning module, which autonomously generates diverse training tasks and directly addresses the data scarcity problem.The experiments also demonstrated that AgentEvolver can efficiently synthesize a large volume of high-quality training data. The tasks generated by the self-questioning module proved diverse enough to achieve good training efficiency even with a small amount of data. For enterprises, this provides a path to creating agents for bespoke applications and internal workflows while minimizing the need for manual data annotation. By providing high-level goals and letting the agent generate its own training experiences, organizations can develop custom AI assistants more simply and cost-effectively.“This combination of algorithmic design and engineering pragmatics positions AgentEvolver as both a research vehicle and a reusable foundation for building adaptive, tool-augmented agents,” the researchers conclude.Looking ahead, the ultimate goal is much bigger. “A truly ‘singular model’ that can drop into any software environment and master it overnight is certainly the holy grail of agentic AI,” Zhai said. “We see AgentEvolver as a necessary step in that direction.” While that future still requires breakthroughs in model reasoning and infrastructure, self-evolving approaches are paving the way.

#gemini models #a message from our ceo #ai

Sundar Pichai sits down with Logan Kilpatrick to discuss Gemini 3 on the Google AI: Release Notes podcast.

#amazon sagemaker #amazon sagemaker ai #amazon sagemaker autopilot #amazon sagemaker ground truth #manufacturing #open source #technical how-to #ai/ml #amazon machine learning #computer vision #amazon lookout for vision

In this post, we demonstrate how to migrate computer vision workloads from Amazon Lookout for Vision to Amazon SageMaker AI by training custom defect detection models using pre-trained models available on AWS Marketplace. We provide step-by-step guidance on labeling datasets with SageMaker Ground Truth, training models with flexible hyperparameter configurations, and deploying them for real-time or batch inference—giving you greater control and flexibility for automated quality inspection use cases.

#business #business / artificial intelligence

In this episode of Uncanny Valley, we cover the news of the week and take a closer look at the Gemini 3, Google’s latest AI model and chatbot.

#artificial intelligence #generative ai #thought leadership

The AWS Customer Success Center of Excellence (CS COE) helps customers get tangible value from their AWS investments. We've seen a pattern: customers who build AI strategies that address people, process, and technology together succeed more often. In this post, we share practical considerations that can help close the AI value gap.

#amazon sagemaker ai #artificial intelligence #technical how-to

We're introducing bidirectional streaming for Amazon SageMaker AI Inference, which transforms inference from a transactional exchange into a continuous conversation. This post shows you how to build and deploy a container with bidirectional streaming capability to a SageMaker AI endpoint. We also demonstrate how you can bring your own container or use our partner Deepgram's pre-built models and containers on SageMaker AI to enable bi-directional streaming feature for real-time inference.

#agentic ai #deep dives #llm applications #multi agent systems #orchestration #crewai

A real-world analysis of why CrewAI’s hierarchical orchestration misfires—and a practical fix you can implement today.
The post Why CrewAI’s Manager-Worker Architecture Fails — and How to Fix It appeared first on Towards Data Science.

Don’t Miss DataCamp’s Black Friday Deal: Nov 12-Dec 4

#search #shopping #pixel #ai #maps

Learn more about using Google products like Gemini, Search, Shopping, Pixel and more over the holidays.

#google labs #ai

You’ll now get more creative control in Flow with new refinement and editing capabilities.

#amazon machine learning #amazon sagemaker ai #customer solutions #graviton #intermediate (200)

Warner Bros. Discovery (WBD) is a leading global media and entertainment company that creates and distributes the world’s most differentiated and complete portfolio of content and brands across television, film and streaming. In this post, we describe the scale of our offerings, artificial intelligence (AI)/machine learning (ML) inference infrastructure requirements for our real time recommender systems, and how we used AWS Graviton-based Amazon SageMaker AI instances for our ML inference workloads and achieved 60% cost savings and 7% to 60% latency improvements across different models.

#artificial intelligence #healthcare #robotics #thought leadership

In this post, we explore the complete development lifecycle of physical AI—from data collection and model training to edge deployment—and examine how these intelligent systems learn to understand, reason, and interact with the physical world through continuous feedback loops. We illustrate this workflow through Diligent Robotics' Moxi, a mobile manipulation robot that has completed over 1.2 million deliveries in hospitals, saving nearly 600,000 hours for clinical staff while transforming healthcare logistics and returning valuable time to patient care.

#research #special events and guest speakers #computer science and technology #health sciences and technology #artificial intelligence #machine learning #drug development #proteins #medicine #pharmaceuticals #jameel clinic #electrical engineering and computer science (eecs) #computer science and artificial intelligence laboratory (csail) #school of engineering #mit schwarzman college of computing

BoltzGen generates protein binders for any biological target from scratch, expanding AI’s reach from understanding biology toward engineering it.

#amazon sagemaker hyperpod #announcements #artificial intelligence #generative ai #launch #technical how-to

In this post, we explore how Amazon SageMaker HyperPod now supports NVIDIA Multi-Instance GPU (MIG) technology, enabling you to partition powerful GPUs into multiple isolated instances for running concurrent workloads like inference, research, and interactive development. By maximizing GPU utilization and reducing wasted resources, MIG helps organizations optimize costs while maintaining performance isolation and predictable quality of service across diverse machine learning tasks.

Explore how AlphaFold has accelerated science and fueled a global wave of biological discovery.

#google cloud #ai

Google’s seventh-gen Tensor Processing Unit is here! Learn what makes Ironwood our most powerful and energy-efficient custom silicon to date.

AlphaFold has revealed the structure of a key protein behind heart disease

#ai

President Donald Trump’s new “Genesis Mission” unveiled Monday, November 24, 2025, is billed as a generational leap in how the United States does science akin to the Manhattan Project that created the atomic bomb during World War II. The executive order directs the Department of Energy (DOE) to build a “closed-loop AI experimentation platform” that links the country’s 17 national laboratories, federal supercomputers, and decades of government scientific data into “one cooperative system for research.” The White House fact sheet casts the initiative as a way to “transform how scientific research is conducted” and “accelerate the speed of scientific discovery,” with priorities spanning biotechnology, critical materials, nuclear fission and fusion, quantum information science, and semiconductors. DOE’s own release calls it “the world’s most complex and powerful scientific instrument ever built” and quotes Under Secretary for Science Darío Gil describing it as a “closed-loop system” linking the nation’s most advanced facilities, data, and computing into “an engine for discovery that doubles R&D productivity.”The text of the order outlines mandatory steps DOE must complete within 60, 90, 120, 240, and 270 days—including identifying all Federal and partner compute resources, cataloging datasets and model assets, assessing robotic laboratory infrastructure across national labs, and demonstrating an initial operating capability for at least one scientific challenge within nine months.The DOE’s own Genesis Mission website adds important context: the initiative is launching with a broad coalition of private-sector, nonprofit, academic, and utility collaborators. The list spans multiple sectors—from advanced materials to aerospace to cloud computing—and includes participants such as Albemarle, Applied Materials, Collins Aerospace, GE Aerospace, Micron, PMT Critical Metals, and the Tennessee Valley Authority. That breadth signals DOE’s intent to position Genesis not just as an internal research overhaul but as a national industrial effort connected to manufacturing, energy infrastructure, and scientific supply chains.The collaborator list also includes many of the most influential AI and compute firms in the United States: OpenAI for Government, Anthropic, Scale AI, Google, Microsoft, NVIDIA, AWS, IBM, Cerebras, HPE, Hugging Face, and Dell Technologies. The DOE frames Genesis as a national-scale instrument — a single “intelligent network," an “end-to-end discovery engine,” one intended to generate new classes of high-fidelity data, accelerate experimental cycles, and reduce research timelines from “years to months.” The agency casts the mission as foundational infrastructure for the next era of American science.Taken together, the roster outlines the technical backbone likely to shape the mission’s early development—hardware vendors, hyperscale cloud providers, frontier-model developers, and orchestration-layer companies. DOE does not describe these entities as contractors or beneficiaries, but their inclusion demonstrates that private-sector technical capacity will play a defining role in building and operating the Genesis platform.What the administration has not provided is just as striking: no public cost estimate, no explicit appropriation, and no breakdown of who will pay for what. Major news outlets including Reuters, Associated Press, Politico, and others have all noted that the order “does not specify new spending or a budget request,” or that funding will depend on future appropriations and previously passed legislation. That omission, combined with the initiative’s scope and timing, raises questions not only about how Genesis will be funded and to what extent, but about who it might quietly benefit.“So is this just a subsidy for big labs or what?”Soon after DOE promoted the mission on X, Teknium of the small U.S. AI lab Nous Research posted a blunt reaction: “So is this just a subsidy for big labs or what.” The line has become a shorthand for a growing concern in the AI community: that the U.S. government could offer some sort of public subsidy for large AI firms facing staggering and rising compute and data costs.That concern is grounded in recent, well-sourced reporting on OpenAI’s finances and infrastructure commitments. Documents obtained and analyzed by tech public relations professional and AI critic Ed Zitron describe a cost structure that has exploded as the company has scaled models like GPT-4, GPT-4.1, and GPT-5.1. The Register has separately inferred from Microsoft quarterly earnings statements that OpenAI lost about $13.5 billion on $4.3 billion in revenue in the first half of 2025 alone. Other outlets and analysts have highlighted projections that show tens of billions in annual losses later this decade if spending and revenue follow current trajectoriesBy contrast, Google DeepMind trained its recent Gemini 3 flagship LLM on the company’s own TPU hardware and in its own data centers, giving it a structural advantage in cost per training run and energy management, as covered in Google’s own technical blogs and subsequent financial reporting. Viewed against that backdrop, an ambitious federal project that promises to integrate “world-class supercomputers and datasets into a unified, closed-loop AI platform” and “power robotic laboratories” sounds, to some observers, like more than a pure science accelerator. It could, depending on how access is structured, also ease the capital bottlenecks facing private frontier-model labs.The aggressive DOE deadlines and the order’s requirement to build a national AI compute-and-experimentation stack amplify those questions: the government is now constructing something strikingly similar to what private labs have been spending billions to build for themselves.The order directs DOE to create standardized agreements governing model sharing, intellectual-property ownership, licensing rules, and commercialization pathways—effectively setting the legal and governance infrastructure needed for private AI companies to plug into the federal platform. While access is not guaranteed and pricing is not specified, the framework for deep public-private integration is now fully established.What the order does not do is guarantee those companies access, spell out subsidized pricing, or earmark public money for their training runs. Any claim that OpenAI, Anthropic, or Google “just got access” to federal supercomputing or national-lab data is, at this point, an interpretation of how the framework could be used, not something the text actually promises.Furthermore, the executive order makes no mention of open-source model development — an omission that stands out in light of remarks last year from Vice President JD Vance, when, prior to assuming office and while serving as a Senator from Ohio and participating in a hearing, he warned against regulations designed to protect incumbent tech firms and was widely praised by open-source advocates.That silence is notable given Vance’s earlier testimony, which many in the AI community interpreted as support for open-source AI or, at minimum, skepticism of policies that entrench incumbent advantages. Genesis instead sketches a controlled-access ecosystem governed by classification rules, export controls, and federal vetting requirements—far from the open-source model some expected this administration to champion.Closed-loop discovery and “autonomous scientific agents”Another viral reaction came from AI influencer Chris (@chatgpt21 on X), who wrote in an X post that that OpenAI, Anthropic, and Google have already “got access to petabytes of proprietary data” from national labs, and that DOE labs have been “hoarding experimental data for decades.” The public record supports a narrower claim.The order and fact sheet describe “federal scientific datasets—the world’s largest collection of such datasets, developed over decades of Federal investments” and direct agencies to identify data that can be integrated into the platform “to the extent permitted by law.” DOE’s announcement similarly talks about unleashing “the full power of our National Laboratories, supercomputers, and data resources.” It is true that the national labs hold enormous troves of experimental data. Some of it is already public via the Office of Scientific and Technical Information (OSTI) and other repositories; some is classified or export-controlled; much is under-used because it sits in fragmented formats and systems. But there is no public document so far that states private AI companies have now been granted blanket access to this data, or that DOE characterizes past practice as “hoarding.”What is clear is that the administration wants to unlock more of this data for AI-driven research and to do so in coordination with external partners. Section 5 of the order instructs DOE and the Assistant to the President for Science and Technology to create standardized partnership frameworks, define IP and licensing rules, and set “stringent data access and management processes and cybersecurity standards for non-Federal collaborators accessing datasets, models, and computing environments.”Equally notable is the national-security framing woven throughout the order. Multiple sections invoke classification rules, export controls, supply-chain security, and vetting requirements that place Genesis at the junction of open scientific inquiry and restricted national-security operations. Access to the platform will be mediated through federal security norms rather than open-science principles.A moonshot with an open question at the centerTaken at face value, the Genesis Mission is an ambitious attempt to use AI and high-performance computing to speed up everything from fusion research to materials discovery and pediatric cancer work, using decades of taxpayer-funded data and instruments that already exist inside the federal system. The executive order spends considerable space on governance: coordination through the National Science and Technology Council, new fellowship programs, and annual reporting on platform status, integration progress, partnerships, and scientific outcomes. The order also codifies, for the first time, the development of AI agents capable of generating hypotheses, designing experiments, interpreting results, and directing robotic laboratories—an explicit embrace of automated scientific discovery and a significant departure from prior U.S. science directives.Yet the initiative also lands at a moment when frontline AI labs are buckling under their own compute bills, when one of them—OpenAI—is reported to be spending more on running models than it earns in revenue, and when investors are openly debating whether the current business model for proprietary frontier AI is sustainable without some form of outside support.In that environment, a federally funded, closed-loop AI discovery platform that centralizes the country’s most powerful supercomputers and data is inevitably going to be read in more than one way. It may become a genuine engine for public science. It may also become a crucial piece of infrastructure for the very companies driving today’s AI arms race.Standing up a platform of this scale—complete with robotic labs, synthetic data generation pipelines, multi-agency datasets, and industrial-grade AI agents—would typically require substantial, dedicated appropriations and a multi-year budget roadmap. Yet the order remains silent on cost, leaving observers to speculate whether the administration will repurpose existing resources, seek congressional appropriations later, or rely heavily on private-sector partnerships to build the platform.For now, one fact is undeniable: the administration has launched a mission it compares to the Manhattan Project without telling the public what it will cost, how the money will flow, or exactly who will be allowed to plug into it. How enterprise tech leaders should interpret the Genesis MissionFor enterprise teams already building or scaling AI systems, the Genesis Mission signals a shift in how national infrastructure, data governance, and high-performance compute will evolve in the U.S.—and those signals matter even before the government publishes a budget. The initiative outlines a federated, AI-driven scientific ecosystem where supercomputers, datasets, and automated experimentation loops operate as tightly integrated pipelines. That direction mirrors the trajectory many companies are already moving toward: larger models, more experimentation, heavier orchestration, and a growing need for systems that can manage complex workloads with reliability and traceability.Even though Genesis is aimed at science, its architecture hints at what will become expected norms across American industries.The specificity of the order’s deadlines also signals where enterprise expectations may shift next: toward standardized metadata, provenance tracking, multi-cloud interoperability, AI pipeline observability, and rigorous access controls. As DOE operationalizes Genesis, enterprises—particularly in regulated sectors such as biotech, energy, pharmaceuticals, and advanced manufacturing—may find themselves evaluated against emerging federal norms for data governance and AI-system integrity.The lack of cost detail around Genesis does not directly alter enterprise roadmaps, but it does reinforce the broader reality that compute scarcity, escalating cloud costs, and rising standards for AI model governance will remain central challenges. Companies that already struggle with constrained budgets or tight headcount—particularly those responsible for deployment pipelines, data integrity, or AI security—should view Genesis as early confirmation that efficiency, observability, and modular AI infrastructure will remain essential. As the federal government formalizes frameworks for data access, experiment traceability, and AI agent oversight, enterprises may find that future compliance regimes or partnership expectations take cues from these federal standards.Genesis also underscores the growing importance of unifying data sources and ensuring that models can operate across diverse, sometimes sensitive environments. Whether managing pipelines across multiple clouds, fine-tuning models with domain-specific datasets, or securing inference endpoints, enterprise technical leaders will likely see increased pressure to harden systems, standardize interfaces, and invest in complex orchestration that can scale safely. The mission’s emphasis on automation, robotic workflows, and closed-loop model refinement may shape how enterprises structure their internal AI R&D, encouraging them to adopt more repeatable, automated, and governable approaches to experimentation. In this sense, Genesis may serve as an early signal of how national-level AI infrastructure is likely to influence private-sector requirements, especially for companies operating in critical industries or scientific supply chains.Here is what enterprise leaders should be doing now:Expect increased federal involvement in AI infrastructure and data governance. This may indirectly shape cloud availability, interoperability standards, and model-governance expectations.Track “closed-loop” AI experimentation models. This may preview future enterprise R&D workflows and reshape how ML teams build automated pipelines.Prepare for rising compute costs and consider efficiency strategies. This includes smaller models, retrieval-augmented systems, and mixed-precision training.Strengthen AI-specific security practices. Genesis signals that the federal government is escalating expectations for AI system integrity and controlled access.Plan for potential public–private interoperability standards. Enterprises that align early may gain a competitive edge in partnerships and procurement.Overall, Genesis does not change day-to-day enterprise AI operations today. But it strongly signals where federal and scientific AI infrastructure is heading—and that direction will inevitably influence the expectations, constraints, and opportunities enterprises face as they scale their own AI capabilities.

Deploy an AI analyst fast by connecting any LLM to your SQL database with Bag of Words, allowing immediate, trustworthy data insights via natural language queries.

#power bi #data analysis #data science #dax #power bi tutorials

Starting with the September 2025 Release of Power BI, Microsoft introduced the new Calendar-based Time Intelligence feature. Let’s see what can be done by implementing three use cases. The future looks very interesting with this new feature.
The post How to Implement Three Use Cases for the New Calendar-Based Time Intelligence appeared first on Towards Data Science.

#llm applications #artificial intelligence #deep dives #llm #machine learning #engineering

Practical field notes on workflows, structure, and evaluation from two years of building with engineering domain experts.
The post Ten Lessons of Building LLM Applications for Engineers appeared first on Towards Data Science.

Learn the theory, math, and engineering behind machine learning with these highly recommended free books.

#culture #culture / digital culture

In this episode of Uncanny Valley, we talk about some of the latest drug trends and all the ways drugs are changing as they continue to be intertwined with tech.

#llm applications #ai agent #cursor #llm #technical writing #latex

Learn how to rapidly create professional articles and presentations with LaTeX in Cursor
The post How to Create Professional Articles with LaTeX in Cursor appeared first on Towards Data Science.

Clustering models in machine learning must be assessed by how well they separate data into meaningful groups with distinctive characteristics.

#ai

OpenAI expanded its data residency regions for ChatGPT and its API, giving enterprise users the option to store and process their data closest to their business operations and better comply with local regulations. This expansion removes one of the biggest compliance blockers preventing global enterprises from deploying ChatGPT at scale.Data residency, often an overlooked piece of the enterprise AI puzzle, processes and governs data according to the laws and customs of the countries where it is stored. ChatGPT Enterprise and Edu subscribers can now choose to have their data processed in: Europe (European Economic Area and Switzerland)United KingdomUnited StatesCanadaJapanSouth KoreaSingaporeIndiaAustraliaUnited Arab EmiratesOpenAI said in a blog post that it “plans to expand availability to additional regions over time.” Customers can store data such as conversations, uploaded files, custom GPTs, and image-generation artifacts. This applies only to data at rest, not while it moves through a system or when it is used for inference. OpenAI’s documentation notes that, for now, inference residency remains available only in the U.S.  ChatGPT Enterprise and Edu users can set up new workspaces with data residency. Enterprise customers on the API who have been approved for advanced data controls can enable data residency by creating a new project and selecting their preferred region.OpenAI first began offering data residency in Europe in February this year. The European Union has some of the strictest data regulations globally, based on the GDPR. The importance of data residencyEnterprises until now had fewer choices for processing data flowing through ChatGPT. For example, some organizational data would be processed under U.S. law rather than under European rules. Enterprises risk violating data compliance rules if their data at rest is processed elsewhere and does not meet strict policies. “With over 1 million business customers around the world directly using OpenAI, we have expanded where we offer data residency — allowing business customers to store data in certain regions, helping organizations meet local regulatory and data protection requirements,” the company said in its blog post. However, enterprises must also understand that if they are using a connector or integration within ChatGPT, those applications have different data residency rules. When OpenAI launched company knowledge for ChatGPT, it warned users that depending on the connector they use, data residency may be limited to the U.S. 

Screen-free AI, Claude 4.5 Opus, ChatGPT shopping, Nano Banana Pro, and more...

#amazon bedrock #amazon machine learning #artificial intelligence #generative ai

We are excited to announce that customers in Canada can now access advanced foundation models including Anthropic's Claude Sonnet 4.5 and Claude Haiku 4.5 on Amazon Bedrock through cross-Region inference (CRIS). This post explores how Canadian organizations can use cross-Region inference profiles from the Canada (Central) Region to access the latest foundation models to accelerate AI initiatives. We will demonstrate how to get started with these new capabilities, provide guidance for migrating from older models, and share recommended practices for quota management.

#ai

Anthropic released its most capable artificial intelligence model yet on Monday, slashing prices by roughly two-thirds while claiming state-of-the-art performance on software engineering tasks — a strategic move that intensifies the AI startup's competition with deep-pocketed rivals OpenAI and Google.The new model, Claude Opus 4.5, scored higher on Anthropic's most challenging internal engineering assessment than any human job candidate in the company's history, according to materials reviewed by VentureBeat. The result underscores both the rapidly advancing capabilities of AI systems and growing questions about how the technology will reshape white-collar professions.The Amazon-backed company is pricing Claude Opus 4.5 at $5 per million input tokens and $25 per million output tokens — a dramatic reduction from the $15 and $75 rates for its predecessor, Claude Opus 4.1, released earlier this year. The move makes frontier AI capabilities accessible to a broader swath of developers and enterprises while putting pressure on competitors to match both performance and pricing."We want to make sure this really works for people who want to work with these models," said Alex Albert, Anthropic's head of developer relations, in an exclusive interview with VentureBeat. "That is really our focus: How can we enable Claude to be better at helping you do the things that you don't necessarily want to do in your job?"The announcement comes as Anthropic races to maintain its position in an increasingly crowded field. OpenAI recently released GPT-5.1 and a specialized coding model called Codex Max that can work autonomously for extended periods. Google unveiled Gemini 3 just last week, prompting concerns even from OpenAI about the search giant's progress, according to a recent report from The Information.Opus 4.5 demonstrates improved judgment on real-world tasks, developers sayAnthropic's internal testing revealed what the company describes as a qualitative leap in Claude Opus 4.5's reasoning capabilities. The model achieved 80.9% accuracy on SWE-bench Verified, a benchmark measuring real-world software engineering tasks, outperforming OpenAI's GPT-5.1-Codex-Max (77.9%), Anthropic's own Sonnet 4.5 (77.2%), and Google's Gemini 3 Pro (76.2%), according to the company's data. The result marks a notable advance over OpenAI's current state-of-the-art model, which was released just five days earlier.But the technical benchmarks tell only part of the story. Albert said employee testers consistently reported that the model demonstrates improved judgment and intuition across diverse tasks — a shift he described as the model developing a sense of what matters in real-world contexts."The model just kind of gets it," Albert said. "It just has developed this sort of intuition and judgment on a lot of real world things that feels qualitatively like a big jump up from past models."He pointed to his own workflow as an example. Previously, Albert said, he would ask AI models to gather information but hesitated to trust their synthesis or prioritization. With Opus 4.5, he's delegating more complete tasks, connecting it to Slack and internal documents to produce coherent summaries that match his priorities.Opus 4.5 outscores all human candidates on company's toughest engineering testThe model's performance on Anthropic's internal engineering assessment marks a notable milestone. The take-home exam, designed for prospective performance engineering candidates, is meant to evaluate technical ability and judgment under time pressure within a prescribed two-hour limit.Using a technique called parallel test-time compute — which aggregates multiple attempts from the model and selects the best result — Opus 4.5 scored higher than any human candidate who has taken the test, according to company. Without a time limit, the model matched the performance of the best-ever human candidate when used within Claude Code, Anthropic's coding environment.The company acknowledged that the test doesn't measure other crucial professional skills such as collaboration, communication, or the instincts that develop over years of experience. Still, Anthropic said the result "raises questions about how AI will change engineering as a profession."Albert emphasized the significance of the finding. "I think this is kind of a sign, maybe, of what's to come around how useful these models can actually be in a work context and for our jobs," he said. "Of course, this was an engineering task, and I would say models are relatively ahead in engineering compared to other fields, but I think it's a really important signal to pay attention to."Dramatic efficiency improvements cut token usage by up to 76% on key benchmarksBeyond raw performance, Anthropic is betting that efficiency improvements will differentiate Claude Opus 4.5 in the market. The company says the model uses dramatically fewer tokens — the units of text that AI systems process — to achieve similar or better outcomes compared to predecessors.At a medium effort level, Opus 4.5 matches the previous Sonnet 4.5 model's best score on SWE-bench Verified while using 76% fewer output tokens, according to Anthropic. At the highest effort level, Opus 4.5 exceeds Sonnet 4.5 performance by 4.3 percentage points while still using 48% fewer tokens.To give developers more control, Anthropic introduced an "effort parameter" that allows users to adjust how much computational work the model applies to each task — balancing performance against latency and cost.Enterprise customers provided early validation of the efficiency claims. "Opus 4.5 beats Sonnet 4.5 and competition on our internal benchmarks, using fewer tokens to solve the same problems," said Michele Catasta, president of Replit, a cloud-based coding platform, in a statement to VentureBeat. "At scale, that efficiency compounds."GitHub's chief product officer, Mario Rodriguez, said early testing shows Opus 4.5 "surpasses internal coding benchmarks while cutting token usage in half, and is especially well-suited for tasks like code migration and code refactoring."Early customers report AI agents that learn from experience and refine their own skillsOne of the most striking capabilities demonstrated by early customers involves what Anthropic calls "self-improving agents" — AI systems that can refine their own performance through iterative learning.Rakuten, the Japanese e-commerce and internet company, tested Claude Opus 4.5 on automation of office tasks. "Our agents were able to autonomously refine their own capabilities — achieving peak performance in 4 iterations while other models couldn't match that quality after 10," said Yusuke Kaji, Rakuten's general manager of AI for business.Albert explained that the model isn't updating its own weights — the fundamental parameters that define an AI system's behavior — but rather iteratively improving the tools and approaches it uses to solve problems. "It was iteratively refining a skill for a task and seeing that it's trying to optimize the skill to get better performance so it could accomplish this task," he said.The capability extends beyond coding. Albert said Anthropic has observed significant improvements in creating professional documents, spreadsheets, and presentations. "They're saying that this has been the biggest jump they've seen between model generations," Albert said. "So going even from Sonnet 4.5 to Opus 4.5, bigger jump than any two models back to back in the past."Fundamental Research Labs, a financial modeling firm, reported that "accuracy on our internal evals improved 20%, efficiency rose 15%, and complex tasks that once seemed out of reach became achievable," according to co-founder Nico Christie.New features target Excel users, Chrome workflows and eliminate chat length limitsAlongside the model release, Anthropic rolled out a suite of product updates aimed at enterprise users. Claude for Excel became generally available for Max, Team, and Enterprise users with new support for pivot tables, charts, and file uploads. The Chrome browser extension is now available to all Max users.Perhaps most significantly, Anthropic introduced "infinite chats" — a feature that eliminates context window limitations by automatically summarizing earlier parts of conversations as they grow longer. "Within Claude AI, within the product itself, you effectively get this kind of infinite context window due to the compaction, plus some memory things that we're doing," Albert explained.For developers, Anthropic released "programmatic tool calling," which allows Claude to write and execute code that invokes functions directly. Claude Code gained an updated "Plan Mode" and became available on desktop in research preview, enabling developers to run multiple AI agent sessions in parallel.Market heats up as OpenAI, Google race to match performance and pricingAnthropic reached $2 billion in annualized revenue during the first quarter of 2025, more than doubling from $1 billion in the prior period. The number of customers spending more than $100,000 annually jumped eightfold year-over-year.The rapid release of Opus 4.5 — just weeks after Haiku 4.5 in October and Sonnet 4.5 in September — reflects broader industry dynamics. OpenAI released multiple GPT-5 variants throughout 2025, including a specialized Codex Max model in November that can work autonomously for up to 24 hours. Google shipped Gemini 3 in mid-November after months of development.Albert attributed Anthropic's accelerated pace partly to using Claude to speed its own development. "We're seeing a lot of assistance and speed-up by Claude itself, whether it's on the actual product building side or on the model research side," he said.The pricing reduction for Opus 4.5 could pressure margins while potentially expanding the addressable market. "I'm expecting to see a lot of startups start to incorporate this into their products much more and feature it prominently," Albert said.Yet profitability remains elusive for leading AI labs as they invest heavily in computing infrastructure and research talent. The AI market is projected to top $1 trillion in revenue within a decade, but no single provider has established dominant market position—even as models reach a threshold where they can meaningfully automate complex knowledge work.Michael Truell, CEO of Cursor, an AI-powered code editor, called Opus 4.5 "a notable improvement over the prior Claude models inside Cursor, with improved pricing and intelligence on difficult coding tasks." Scott Wu, CEO of Cognition, an AI coding startup, said the model delivers "stronger results on our hardest evaluations and consistent performance through 30-minute autonomous coding sessions."For enterprises and developers, the competition translates to rapidly improving capabilities at falling prices. But as AI performance on technical tasks approaches—and sometimes exceeds—human expert levels, the technology's impact on professional work becomes less theoretical.When asked about the engineering exam results and what they signal about AI's trajectory, Albert was direct: "I think it's a really important signal to pay attention to."

« 1...910111213...189»
×