Latest AI News & Updates

#amazon bedrock #amazon bedrock agentcore #artificial intelligence #launch

We are launching a new feature: gateway interceptors for Amazon Bedrock AgentCore Gateway. This powerful new capability provides fine-grained security, dynamic access control, and flexible schema management.

#amazon bedrock #amazon sagemaker ai #artificial intelligence #customer solutions #intermediate (200) #ai/ml #generative ai

In this post, we explore how Condé Nast used Amazon Bedrock and Anthropic’s Claude to accelerate their contract processing and rights analysis workstreams. The company’s extensive portfolio, spanning multiple brands and geographies, required managing an increasingly complex web of contracts, rights, and licensing agreements.

#amazon nova #best practices #intermediate (200)

Available through the Amazon Bedrock bidirectional streaming API, Amazon Nova Sonic can connect to your business data and external tools and can be integrated directly with telephony systems. This post will introduce sample implementations for the most common telephony scenarios.

#advanced (300) #amazon bedrock #amazon ec2 #amazon sagemaker ai #amazon simple storage service (s3) #aws lambda #customer solutions #higher education #amazon simple notification service (sns) #aws codebuild

In this post, we will walk through the performance constraints and design choices by OARC and REMAP teams at UCLA, including how AWS serverless infrastructure, AWS Managed Services, and generative AI services supported the rapid design and deployment of our solution. We will also describe our use of Amazon SageMaker AI and how it can be used reliably in immersive live experiences.

#business #business / artificial intelligence

Project Prometheus has raised over $6 billion in funding and hired over 100 employees, a handful of whom joined through its acquisition of General Agents, according to records and sources.

#advanced (300) #amazon ec2 #amazon machine learning #compute #customer solutions #graviton #amazon ec2 container service

In this post, we focus on one portion of the REM™ system: the automatic identification of changes to the road structure which we will refer to as Change Detection. We will share our journey of architecting and deploying a solution for Change Detection, the core of which is a deep learning model called CDNet. We will share real-life decisions and tradeoffs when building and deploying a high-scale, highly parallelized algorithmic pipeline based on a Deep Learning (DL) model, with an emphasis on efficiency and throughput.

#amazon nova #amazon sagemaker ai #technical how-to

This blog post introduces the new Amazon Nova model evaluation features in Amazon SageMaker AI. This release adds custom metrics support, LLM-based preference testing, log probability capture, metadata analysis, and multi-node scaling for large evaluations.

#data science #data analysis #data cleaning #data visualization #deep dives #pandas

Stop guessing at data cleaning. Use this repeatable 5-step Python workflow to diagnose and fix the most common data flaws.
The post I Cleaned a Messy CSV File Using Pandas .  Here’s the Exact Process I Follow Every Time. appeared first on Towards Data Science.

#business #business / artificial intelligence #business / big tech

Amazon Employees for Climate Justice says that over 1,000 workers have signed a petition raising “serious concerns” about the company’s “aggressive rollout” of artificial intelligence tools.

#artificial intelligence #thought leadership

In this post, we explore three essential strategies for successfully integrating AI into your organization: addressing organizational debt before it compounds, embracing distributed decision-making through the "octopus organization" model, and redefining management roles to align with AI-powered workflows. Organizations must invest in both technology and workforce preparation, focusing on streamlining processes, empowering teams with autonomous decision-making within defined parameters, and evolving each management layer from traditional oversight to mentorship, quality assurance, and strategic vision-setting.

#amazon bedrock #announcements

You can now achieve significant performance improvements when using Amazon Bedrock Custom Model Import, with reduced end-to-end latency, faster time-to-first-token, and improved throughput through advanced PyTorch compilation and CUDA graph optimizations. With Amazon Bedrock Custom Model Import you can to bring your own foundation models to Amazon Bedrock for deployment and inference at scale. In this post, we introduce how to use the improvements in Amazon Bedrock Custom Model Import.

Understanding the underlying technology helps explain why AI browsers exhibit such uneven performance.

Large language models (LLMs) are based on the transformer architecture, a complex deep neural network whose input is a sequence of token embeddings.

#ai

This weekend, Andrej Karpathy, the former director of AI at Tesla and a founding member of OpenAI, decided he wanted to read a book. But he did not want to read it alone. He wanted to read it accompanied by a committee of artificial intelligences, each offering its own perspective, critiquing the others, and eventually synthesizing a final answer under the guidance of a "Chairman."To make this happen, Karpathy wrote what he called a "vibe code project" — a piece of software written quickly, largely by AI assistants, intended for fun rather than function. He posted the result, a repository called "LLM Council," to GitHub with a stark disclaimer: "I’m not going to support it in any way... Code is ephemeral now and libraries are over."Yet, for technical decision-makers across the enterprise landscape, looking past the casual disclaimer reveals something far more significant than a weekend toy. In a few hundred lines of Python and JavaScript, Karpathy has sketched a reference architecture for the most critical, undefined layer of the modern software stack: the orchestration middleware sitting between corporate applications and the volatile market of AI models.As companies finalize their platform investments for 2026, LLM Council offers a stripped-down look at the "build vs. buy" reality of AI infrastructure. It demonstrates that while the logic of routing and aggregating AI models is surprisingly simple, the operational wrapper required to make it enterprise-ready is where the true complexity lies.How the LLM Council works: Four AI models debate, critique, and synthesize answersTo the casual observer, the LLM Council web application looks almost identical to ChatGPT. A user types a query into a chat box. But behind the scenes, the application triggers a sophisticated, three-stage workflow that mirrors how human decision-making bodies operate.First, the system dispatches the user’s query to a panel of frontier models. In Karpathy’s default configuration, this includes OpenAI’s GPT-5.1, Google’s Gemini 3.0 Pro, Anthropic’s Claude Sonnet 4.5, and xAI’s Grok 4. These models generate their initial responses in parallel.In the second stage, the software performs a peer review. Each model is fed the anonymized responses of its counterparts and asked to evaluate them based on accuracy and insight. This step transforms the AI from a generator into a critic, forcing a layer of quality control that is rare in standard chatbot interactions.Finally, a designated "Chairman LLM" — currently configured as Google’s Gemini 3 — receives the original query, the individual responses, and the peer rankings. It synthesizes this mass of context into a single, authoritative answer for the user.Karpathy noted that the results were often surprising. "Quite often, the models are surprisingly willing to select another LLM's response as superior to their own," he wrote on X (formerly Twitter). He described using the tool to read book chapters, observing that the models consistently praised GPT-5.1 as the most insightful while rating Claude the lowest. However, Karpathy’s own qualitative assessment diverged from his digital council; he found GPT-5.1 "too wordy" and preferred the "condensed and processed" output of Gemini.FastAPI, OpenRouter, and the case for treating frontier models as swappable componentsFor CTOs and platform architects, the value of LLM Council lies not in its literary criticism, but in its construction. The repository serves as a primary document showing exactly what a modern, minimal AI stack looks like in late 2025.The application is built on a "thin" architecture. The backend uses FastAPI, a modern Python framework, while the frontend is a standard React application built with Vite. Data storage is handled not by a complex database, but by simple JSON files written to the local disk.The linchpin of the entire operation is OpenRouter, an API aggregator that normalizes the differences between various model providers. By routing requests through this single broker, Karpathy avoided writing separate integration code for OpenAI, Google, and Anthropic. The application does not know or care which company provides the intelligence; it simply sends a prompt and awaits a response.This design choice highlights a growing trend in enterprise architecture: the commoditization of the model layer. By treating frontier models as interchangeable components that can be swapped by editing a single line in a configuration file — specifically the COUNCIL_MODELS list in the backend code — the architecture protects the application from vendor lock-in. If a new model from Meta or Mistral tops the leaderboards next week, it can be added to the council in seconds.What's missing from prototype to production: Authentication, PII redaction, and complianceWhile the core logic of LLM Council is elegant, it also serves as a stark illustration of the gap between a "weekend hack" and a production system. For an enterprise platform team, cloning Karpathy’s repository is merely step one of a marathon.A technical audit of the code reveals the missing "boring" infrastructure that commercial vendors sell for premium prices. The system lacks authentication; anyone with access to the web interface can query the models. There is no concept of user roles, meaning a junior developer has the same access rights as the CIO.Furthermore, the governance layer is nonexistent. In a corporate environment, sending data to four different external AI providers simultaneously triggers immediate compliance concerns. There is no mechanism here to redact Personally Identifiable Information (PII) before it leaves the local network, nor is there an audit log to track who asked what.Reliability is another open question. The system assumes the OpenRouter API is always up and that the models will respond in a timely fashion. It lacks the circuit breakers, fallback strategies, and retry logic that keep business-critical applications running when a provider suffers an outage.These absences are not flaws in Karpathy’s code — he explicitly stated he does not intend to support or improve the project — but they define the value proposition for the commercial AI infrastructure market.Companies like LangChain, AWS Bedrock, and various AI gateway startups are essentially selling the "hardening" around the core logic that Karpathy demonstrated. They provide the security, observability, and compliance wrappers that turn a raw orchestration script into a viable enterprise platform.Why Karpathy believes code is now "ephemeral" and traditional software libraries are obsoletePerhaps the most provocative aspect of the project is the philosophy under which it was built. Karpathy described the development process as "99% vibe-coded," implying he relied heavily on AI assistants to generate the code rather than writing it line-by-line himself."Code is ephemeral now and libraries are over, ask your LLM to change it in whatever way you like," he wrote in the repository’s documentation.This statement marks a radical shift in software engineering capability. Traditionally, companies build internal libraries and abstractions to manage complexity, maintaining them for years. Karpathy is suggesting a future where code is treated as "promptable scaffolding" — disposable, easily rewritten by AI, and not meant to last.For enterprise decision-makers, this poses a difficult strategic question. If internal tools can be "vibe coded" in a weekend, does it make sense to buy expensive, rigid software suites for internal workflows? Or should platform teams empower their engineers to generate custom, disposable tools that fit their exact needs for a fraction of the cost?When AI models judge AI: The dangerous gap between machine preferences and human needsBeyond the architecture, the LLM Council project inadvertently shines a light on a specific risk in automated AI deployment: the divergence between human and machine judgment.Karpathy’s observation that his models preferred GPT-5.1, while he preferred Gemini, suggests that AI models may have shared biases. They might favor verbosity, specific formatting, or rhetorical confidence that does not necessarily align with human business needs for brevity and accuracy.As enterprises increasingly rely on "LLM-as-a-Judge" systems to evaluate the quality of their customer-facing bots, this discrepancy matters. If the automated evaluator consistently rewards "wordy and sprawled" answers while human customers want concise solutions, the metrics will show success while customer satisfaction plummets. Karpathy’s experiment suggests that relying solely on AI to grade AI is a strategy fraught with hidden alignment issues.What enterprise platform teams can learn from a weekend hack before building their 2026 stackUltimately, LLM Council acts as a Rorschach test for the AI industry. For the hobbyist, it is a fun way to read books. For the vendor, it is a threat, proving that the core functionality of their products can be replicated in a few hundred lines of code.But for the enterprise technology leader, it is a reference architecture. It demystifies the orchestration layer, showing that the technical challenge is not in routing the prompts, but in governing the data.As platform teams head into 2026, many will likely find themselves staring at Karpathy’s code, not to deploy it, but to understand it. It proves that a multi-model strategy is not technically out of reach. The question remains whether companies will build the governance layer themselves or pay someone else to wrap the "vibe code" in enterprise-grade armor.

#data science #artificial intelligence #climate change #geospatial #machine learning #remote sensing

The high-resolution physics turning microwave echoes into real-time flood intelligence
The post RISAT’s Silent Promise: Decoding Disasters with Synthetic Aperture Radar appeared first on Towards Data Science.

This article transforms the unwelcome experiences into five comprehensive frameworks that will elevate your Excel-based machine learning work.

#large language models #articial intelligence #editors pick #logistics #mcp #supply chain #sustainability

Discover how Claude can act as a Supply Chain Sustainability Analyst and guide companies toward greener, more efficient inventory management.
The post How I Use AI to Convince Companies to Adopt Sustainability appeared first on Towards Data Science.

#science / environment

The EPA is prioritizing review of new chemicals to be used in data centers. Experts say this could lead to the fast approval of new types of forever chemicals—with limited oversight.

#artificial intelligence #ai #ai hype index #app

Separating AI reality from hyped-up fiction isn’t always easy. That’s why we’ve created the AI Hype Index—a simple, at-a-glance summary of everything you need to know about the state of the industry. Last year, the fantasy author Joanna Maciejewska went viral (if such a thing is still possible on X) with a post saying “I…

This article is divided into three parts; they are: • Creating a BERT Model the Easy Way • Creating a BERT Model from Scratch with PyTorch • Pre-training the BERT Model If your goal is to create a BERT model so that you can train it on your own data, using the Hugging Face `transformers` library is the easiest way to get started.

US AI dominance plan, ChatGPT voice inline, FLUX.2, 3D AI, protein breakthrough, and more...

#ai

It's not just Google's Gemini 3, Nano Banana Pro, and Anthropic's Claude Opus 4.5 we have to be thankful for this year around the Thanksgiving holiday here in the U.S.No, today the German AI startup Black Forest Labs released FLUX.2, a new image generation and editing system complete with four different models designed to support production-grade creative workflows.FLUX.2 introduces multi-reference conditioning, higher-fidelity outputs, and improved text rendering, and it expands the company’s open-core ecosystem with both commercial endpoints and open-weight checkpoints. While Black Forest Labs previously launched with and made a name for itself on open source text-to-image models in its Flux family, today's release includes one fully open-source component: the Flux.2 VAE, available now under the Apache 2.0 license.Four other models of varying size and uses — Flux.2 [Pro], Flux.2 [Flex], and Flux.2 [Dev] —are not open source; Pro and Flex remain proprietary hosted offerings, while Dev is an open-weight downloadable model that requires a commercial license obtained directly from Black Forest Labs for any commercial use. An upcoming open-source model is Flux.2 [Klein], which will also be released under Apache 2.0 when available. But the open source Flux.2 VAE, or variational autoencoder, is important and useful to enterprises for several reasons. This is a module that compresses images into a latent space and reconstructs them back into high-resolution outputs; in Flux.2, it defines the latent representation used across the multiple (four total, see blow) model variants, enabling higher-quality reconstructions, more efficient training, and 4-megapixel editing. Because this VAE is open and freely usable, enterprises can adopt the same latent space used by BFL’s commercial models in their own self-hosted pipelines, gaining interoperability between internal systems and external providers while avoiding vendor lock-in.The availability of a fully open, standardized latent space also enables practical benefits beyond media-focused organizations. Enterprises can use an open-source VAE as a stable, shared foundation for multiple image-generation models, allowing them to switch or mix generators without reworking downstream tools or workflows. Standardizing on a transparent, Apache-licensed VAE supports auditability and compliance requirements, ensures consistent reconstruction quality across internal assets, and allows future models trained for the same latent space to function as drop-in replacements.This transparency also enables downstream customization such as lightweight fine-tuning for brand styles or internal visual templates—even for organizations that do not specialize in media but rely on consistent, controllable image generation for marketing materials, product imagery, documentation, or stock-style visuals. The announcement positions FLUX.2 as an evolution of the FLUX.1 family, with an emphasis on reliability, controllability, and integration into existing creative pipelines rather than one-off demos.A Shift Toward Production-Centric Image ModelsFLUX.2 extends the prior FLUX.1 architecture with more consistent character, layout, and style adherence across up to ten reference images. The system maintains coherence at 4-megapixel resolutions for both generation and editing tasks, enabling use cases such as product visualization, brand-aligned asset creation, and structured design workflows. The model also improves prompt following across multi-part instructions while reducing failure modes related to lighting, spatial logic, and world knowledge.In parallel, Black Forest Labs continues to follow an open-core release strategy. The company provides hosted, performance-optimized versions of FLUX.2 for commercial deployments, while also publishing inspectable open-weight models that researchers and independent developers can run locally. This approach extends a track record begun with FLUX.1, which became the most widely used open image model globally.Model Variants and Deployment OptionsFlux.2 arrives with 5 variants as follows:Flux.2 [Pro]: This is the highest-performance tier, intended for applications that require minimal latency and maximal visual fidelity. It is available through the BFL Playground, the FLUX API, and partner platforms. The model aims to match leading closed-weight systems in prompt adherence and image quality while reducing compute demand.Flux.2 [Flex]: This version exposes parameters such as the number of sampling steps and the guidance scale. The design enables developers to tune the trade-offs between speed, text accuracy, and detail fidelity. In practice, this enables workflows where low-step previews can be generated quickly before higher-step renders are invoked.Flux.2 [Dev]: The most notable release for the open ecosystem is the 32-billion-parameter open-weight checkpoint which integrates text-to-image generation and image editing into a single model. It supports multi-reference conditioning without requiring separate modules or pipelines. The model can run locally using BFL’s reference inference code or optimized fp8 implementations developed in partnership with NVIDIA and ComfyUI. Hosted inference is also available via FAL, Replicate, Runware, Verda, TogetherAI, Cloudflare, and DeepInfra.Flux.2 [Klein]: Coming soon, this size-distilled model is released under Apache 2.0 and is intended to offer improved performance relative to comparable models of the same size trained from scratch. A beta program is currently open.Flux.2 – VAE: Released under the enterprise friendly (even for commercial use) Apache 2.0 license, updated variational autoencoder provides the latent space that underpins all Flux.2 variants. The VAE emphasizes an optimized balance between reconstruction fidelity, learnability, and compression rate—a long-standing challenge for latent-space generative architectures. Benchmark PerformanceBlack Forest Labs published two sets of evaluations highlighting FLUX.2’s performance relative to other open-weight and hosted image-generation models. In head-to-head win-rate comparisons across three categories—text-to-image generation, single-reference editing, and multi-reference editing—FLUX.2 [Dev] led all open-weight alternatives by a substantial margin. It achieved a 66.6% win rate in text-to-image generation (vs. 51.3% for Qwen-Image and 48.1% for Hunyuan Image 3.0), 59.8% in single-reference editing (vs. 49.3% for Qwen-Image and 41.2% for FLUX.1 Kontext), and 63.6% in multi-reference editing (vs. 36.4% for Qwen-Image). These results reflect consistent gains over both earlier FLUX.1 models and contemporary open-weight systems.A second benchmark compared model quality using ELO scores against approximate per-image cost. In this analysis, FLUX.2 [Pro], FLUX.2 [Flex], and FLUX.2 [Dev] cluster in the upper-quality, lower-cost region of the chart, with ELO scores in the ~1030–1050 band while operating in the 2–6 cent range. By contrast, earlier models such as FLUX.1 Kontext [max] and Hunyuan Image 3.0 appear significantly lower on the ELO axis despite similar or higher per-image costs. Only proprietary competitors like Nano Banana 2 reach higher ELO levels, but at noticeably elevated cost. According to BFL, this positions FLUX.2’s variants as offering strong quality–cost efficiency across performance tiers, with FLUX.2 [Dev] in particular delivering near–top-tier quality while remaining one of the lowest-cost options in its class.Pricing via API and Comparison to Nano Banana ProA pricing calculator on BFL’s site indicates that FLUX.2 [Pro] is billed at roughly $0.03 per megapixel of combined input and output. A standard 1024×1024 (1 MP) generation costs $0.030, and higher resolutions scale proportionally. The calculator also counts input images toward total megapixels, suggesting that multi-image reference workflows will have higher per-call costs.By contrast, Google’s Gemini 3 Pro Image Preview aka "Nano Banana Pro," currently prices image output at $120 per 1M tokens, resulting in a cost of $0.134 per 1K–2K image (up to 2048×2048) and $0.24 per 4K image. Image input is billed at $0.0011 per image, which is negligible compared to output costs. While Gemini’s model uses token-based billing, its effective per-image pricing places 1K–2K images at more than 4× the cost of a 1 MP FLUX.2 [Pro] generation, and 4K outputs at roughly 8× the cost of a similar-resolution FLUX.2 output if scaled proportionally.In practical terms, the available data suggests that FLUX.2 [Pro] currently offers significantly lower per-image pricing, particularly for high-resolution outputs or multi-image editing workflows, whereas Gemini 3 Pro’s preview tier is positioned as a higher-cost, token-metered service with more variability depending on resolution.Technical Design and the Latent Space OverhaulFLUX.2 is built on a latent flow matching architecture, combining a rectified flow transformer with a vision-language model based on Mistral-3 (24B). The VLM contributes semantic grounding and contextual understanding, while the transformer handles spatial structure, material representation, and lighting behavior.A major component of the update is the re-training of the model’s latent space. The FLUX.2 VAE integrates advances in semantic alignment, reconstruction quality, and representational learnability drawn from recent research on autoencoder optimization. Earlier models often faced trade-offs in the learnability–quality–compression triad: highly compressed spaces increase training efficiency but degrade reconstructions, while wider bottlenecks can reduce the ability of generative models to learn consistent transformations.According to BFL’s research data, the FLUX.2 VAE achieves lower LPIPS distortion than the FLUX.1 and SD autoencoders while also improving generative FID. This balance allows FLUX.2 to support high-fidelity editing—an area that typically demands reconstruction accuracy—and still maintain competitive learnability for large-scale generative training.Capabilities Across Creative WorkflowsThe most significant functional upgrade is multi-reference support. FLUX.2 can ingest up to ten reference images and maintain identity, product details, or stylistic elements across the output. This feature is relevant for commercial applications such as merchandising, virtual photography, storyboarding, and branded campaign development.The system’s typography improvements address a persistent challenge for diffusion- and flow-based architectures. FLUX.2 is able to generate legible fine text, structured layouts, UI elements, and infographic-style assets with greater reliability. This capability, combined with flexible aspect ratios and high-resolution editing, broadens the use cases where text and image jointly define the final output.FLUX.2 enhances instruction following for multi-step, compositional prompts, enabling more predictable outcomes in constrained workflows. The model exhibits better grounding in physical attributes—such as lighting and material behavior—reducing inconsistencies in scenes requiring photoreal equilibrium.Ecosystem and Open-Core StrategyBlack Forest Labs continues to position its models within an ecosystem that blends open research with commercial reliability. The FLUX.1 open models helped establish the company’s reach across both the developer and enterprise markets, and FLUX.2 expands this structure: tightly optimized commercial endpoints for production deployments and open, composable checkpoints for research and community experimentation.The company emphasizes transparency through published inference code, open-weight VAE release, prompting guides, and detailed architectural documentation. It also continues to recruit talent in Freiburg and San Francisco as it pursues a longer-term roadmap toward multimodal models that unify perception, memory, reasoning, and generation.Background: Flux and the Formation of Black Forest LabsBlack Forest Labs (BFL) was founded in 2024 by Robin Rombach, Patrick Esser, and Andreas Blattmann, the original creators of Stable Diffusion. Their move from Stability AI came at a moment of turbulence for the broader open-source generative AI community, and the launch of BFL signaled a renewed effort to build accessible, high-performance image models. The company secured $31 million in seed funding led by Andreessen Horowitz, with additional support from Brendan Iribe, Michael Ovitz, and Garry Tan, providing early validation for its technical direction.BFL’s first major release, FLUX.1, introduced a 12-billion-parameter architecture available in Pro, Dev, and Schnell variants. It quickly gained a reputation for output quality that matched or exceeded closed-source competitors such as Midjourney v6 and DALL·E 3, while the Dev and Schnell versions reinforced the company’s commitment to open distribution. FLUX.1 also saw rapid adoption in downstream products, including xAI’s Grok 2, and arrived amid ongoing industry discussions about dataset transparency, responsible model usage, and the role of open-source distribution. BFL published strict usage policies aimed at preventing misuse and non-consensual content generation.In late 2024, BFL expanded the lineup with Flux 1.1 Pro, a proprietary high-speed model delivering sixfold generation speed improvements and achieving leading ELO scores on Artificial Analysis. The company launched a paid API alongside the release, enabling configurable integrations with adjustable resolution, model choice, and moderation settings at pricing that began at $0.04 per image. Partnerships with TogetherAI, Replicate, FAL, and Freepik broadened access and made the model available to users without the need for self-hosting, extending BFL’s reach across commercial and creator-oriented platforms.These developments unfolded against a backdrop of accelerating competition in generative media.Implications for Enterprise Technical Decision MakersThe FLUX.2 release carries distinct operational implications for enterprise teams responsible for AI engineering, orchestration, data management, and security. For AI engineers responsible for model lifecycle management, the availability of both hosted endpoints and open-weight checkpoints enables flexible integration paths. FLUX.2’s multi-reference capabilities and expanded resolution support reduce the need for bespoke fine-tuning pipelines when handling brand-specific or identity-consistent outputs, lowering development overhead and accelerating deployment timelines. The model’s improved prompt adherence and typography performance also reduce iterative prompting cycles, which can have a measurable impact on production workload efficiency.Teams focused on AI orchestration and operational scaling benefit from the structure of FLUX.2’s product family. The Pro tier offers predictable latency characteristics suitable for pipeline-critical workloads, while the Flex tier enables direct control over sampling steps and guidance parameters, aligning with environments that require strict performance tuning. Open-weight access for the Dev model facilitates the creation of custom containerized deployments and allows orchestration platforms to manage the model under existing CI/CD practices. This is particularly relevant for organizations balancing cutting-edge tooling with budget constraints, as self-hosted deployments offer cost control at the expense of in-house optimization requirements.Data engineering stakeholders gain advantages from the model’s latent architecture and improved reconstruction fidelity. High-quality, predictable image representations reduce downstream data-cleaning burdens in workflows where generated assets feed into analytics systems, creative automation pipelines, or multimodal model development.Because FLUX.2 consolidates text-to-image and image-editing functions into a single model, it simplifies integration points and reduces the complexity of data flows across storage, versioning, and monitoring layers. For teams managing large volumes of reference imagery, the ability to incorporate up to ten inputs per generation may also streamline asset management processes by shifting more variation handling into the model rather than external tooling.For security teams, FLUX.2’s open-core approach introduces considerations related to access control, model governance, and API usage monitoring. Hosted FLUX.2 endpoints allow for centralized enforcement of security policies and reduce local exposure to model weights, which may be preferable for organizations with stricter compliance requirements. Conversely, open-weight deployments require internal controls for model integrity, version tracking, and inference-time monitoring to prevent misuse or unapproved modifications. The model’s handling of typography and realistic compositions also reinforces the need for established content governance frameworks, particularly where generative systems interface with public-facing channels.Across these roles, FLUX.2’s design emphasizes predictable performance characteristics, modular deployment options, and reduced operational friction. For enterprises with lean teams or rapidly evolving requirements, the release offers a set of capabilities aligned with practical constraints around speed, quality, budget, and model governance.FLUX.2 marks a substantial iterative improvement in Black Forest Labs’ generative image stack, with notable gains in multi-reference consistency, text rendering, latent space quality, and structured prompt adherence. By pairing fully managed offerings with open-weight checkpoints, BFL maintains its open-core model while extending its relevance to commercial creative workflows. The release demonstrates a shift from experimental image generation toward more predictable, scalable, and controllable systems suited for operational use.

#amazon sagemaker ai #announcements #technical how-to

Amazon SageMaker AI now supports EAGLE-based adaptive speculative decoding, a technique that accelerates large language model inference by up to 2.5x while maintaining output quality. In this post, we explain how to use EAGLE 2 and EAGLE 3 speculative decoding in Amazon SageMaker AI, covering the solution architecture, optimization workflows using your own datasets or SageMaker's built-in data, and benchmark results demonstrating significant improvements in throughput and latency.

#research #computer science and technology #artificial intelligence #machine learning #human-computer interaction #data #technology and society #laboratory for information and decision systems (lids) #institute for medical engineering and science (imes) #electrical engineering and computer science (eecs) #school of engineering #mit schwarzman college of computing #national science foundation (nsf)

Large language models can learn to mistakenly link certain sentence patterns with specific topics — and may then repeat these patterns instead of reasoning.

#ai

Researchers at Alibaba’s Tongyi Lab have developed a new framework for self-evolving agents that create their own training data by exploring their application environments. The framework, AgentEvolver, uses the knowledge and reasoning capabilities of large language models for autonomous learning, addressing the high costs and manual effort typically required to gather task-specific datasets.Experiments show that compared to traditional reinforcement learning–based frameworks, AgentEvolver is more efficient at exploring its environment, makes better use of data, and adapts faster to application environments. For the enterprise, this is significant because it lowers the barrier to training agents for bespoke applications, making powerful, custom AI assistants more accessible to a wider range of organizations.The high cost of training AI agentsReinforcement learning has become a major paradigm for training LLMs to act as agents that can interact with digital environments and learn from feedback. However, developing agents with RL faces fundamental challenges. First, gathering the necessary training datasets is often prohibitively expensive, requiring significant manual labor to create examples of tasks, especially in novel or proprietary software environments where there are no available off-the-shelf datasets.Second, the RL techniques commonly used for LLMs require the model to run through a massive number of trial-and-error attempts to learn effectively. This process is computationally costly and inefficient. As a result, training capable LLM agents through RL remains laborious and expensive, limiting their deployment in custom enterprise settings.How AgentEvolver worksThe main idea behind AgentEvolver is to give models greater autonomy in their own learning process. The researchers describe it as a “self-evolving agent system” designed to “achieve autonomous and efficient capability evolution through environmental interaction.” It uses the reasoning power of an LLM to create a self-training loop, allowing the agent to continuously improve by directly interacting with its target environment without needing predefined tasks or reward functions.“We envision an agent system where the LLM actively guides exploration, task generation, and performance refinement,” the researchers wrote in their paper.The self-evolution process is driven by three core mechanisms that work together.The first is self-questioning, where the agent explores its environment to discover the boundaries of its functions and identify useful states. It’s like a new user clicking around an application to see what’s possible. Based on this exploration, the agent generates its own diverse set of tasks that align with a user’s general preferences. This reduces the need for handcrafted datasets and allows the agent and its tasks to co-evolve, progressively enabling it to handle more complex challenges. According to Yunpeng Zhai, researcher at Alibaba and co-author of the paper, who spoke to VentureBeat, the self-questioning mechanism effectively turns the model from a “data consumer into a data producer,” dramatically reducing the time and cost required to deploy an agent in a proprietary environment.The second mechanism is self-navigating, which improves exploration efficiency by reusing and generalizing from past experiences. AgentEvolver extracts insights from both successful and unsuccessful attempts and uses them to guide future actions. For example, if an agent tries to use an API function that doesn't exist in an application, it registers this as an experience and learns to verify the existence of functions before attempting to use them in the future.The third mechanism, self-attributing, enhances learning efficiency by providing more detailed feedback. Instead of just a final success or failure signal (a common practice in RL that can result in sparse rewards), this mechanism uses an LLM to assess the contribution of each individual action in a multi-step task. It retrospectively determines whether each step contributed positively or negatively to the final outcome, giving the agent fine-grained feedback that accelerates learning. This is crucial for regulated industries where how an agent solves a problem is as important as the result. “Instead of rewarding a student only for the final answer, we also evaluate the clarity and correctness of each step in their reasoning,” Zhai explained. This improves transparency and encourages the agent to adopt more robust and auditable problem-solving patterns.“By shifting the training initiative from human-engineered pipelines to LLM-guided self-improvement, AgentEvolver establishes a new paradigm that paves the way toward scalable, cost-effective, and continually improving intelligent systems,” the researchers state.The team has also developed a practical, end-to-end training framework that integrates these three mechanisms. A key part of this foundation is the Context Manager, a component that controls the agent's memory and interaction history. While today's benchmarks test a limited number of tools, real enterprise environments can involve thousands of APIs. Zhai acknowledges this is a core challenge for the field, but notes that AgentEvolver was designed to be extended. “Retrieval over extremely large action spaces will always introduce computational challenges, but AgentEvolver’s architecture provides a clear path toward scalable tool reasoning in enterprise settings,” he said.A more efficient path to agent trainingTo measure the effectiveness of their framework, the researchers tested it on AppWorld and BFCL v3, two benchmarks that require agents to perform long, multi-step tasks using external tools. They used models from Alibaba’s Qwen2.5 family (7B and 14B parameters) and compared their performance against a baseline model trained with GRPO, a popular RL technique used to develop reasoning models like DeepSeek-R1.The results showed that integrating all three mechanisms in AgentEvolver led to substantial performance gains. For the 7B model, the average score improved by 29.4%, and for the 14B model, it increased by 27.8% over the baseline. The framework consistently enhanced the models' reasoning and task-execution capabilities across both benchmarks. The most significant improvement came from the self-questioning module, which autonomously generates diverse training tasks and directly addresses the data scarcity problem.The experiments also demonstrated that AgentEvolver can efficiently synthesize a large volume of high-quality training data. The tasks generated by the self-questioning module proved diverse enough to achieve good training efficiency even with a small amount of data. For enterprises, this provides a path to creating agents for bespoke applications and internal workflows while minimizing the need for manual data annotation. By providing high-level goals and letting the agent generate its own training experiences, organizations can develop custom AI assistants more simply and cost-effectively.“This combination of algorithmic design and engineering pragmatics positions AgentEvolver as both a research vehicle and a reusable foundation for building adaptive, tool-augmented agents,” the researchers conclude.Looking ahead, the ultimate goal is much bigger. “A truly ‘singular model’ that can drop into any software environment and master it overnight is certainly the holy grail of agentic AI,” Zhai said. “We see AgentEvolver as a necessary step in that direction.” While that future still requires breakthroughs in model reasoning and infrastructure, self-evolving approaches are paving the way.

#gemini models #a message from our ceo #ai

Sundar Pichai sits down with Logan Kilpatrick to discuss Gemini 3 on the Google AI: Release Notes podcast.

#amazon sagemaker #amazon sagemaker ai #amazon sagemaker autopilot #amazon sagemaker ground truth #manufacturing #open source #technical how-to #ai/ml #amazon machine learning #computer vision #amazon lookout for vision

In this post, we demonstrate how to migrate computer vision workloads from Amazon Lookout for Vision to Amazon SageMaker AI by training custom defect detection models using pre-trained models available on AWS Marketplace. We provide step-by-step guidance on labeling datasets with SageMaker Ground Truth, training models with flexible hyperparameter configurations, and deploying them for real-time or batch inference—giving you greater control and flexibility for automated quality inspection use cases.

#business #business / artificial intelligence

In this episode of Uncanny Valley, we cover the news of the week and take a closer look at the Gemini 3, Google’s latest AI model and chatbot.

#artificial intelligence #generative ai #thought leadership

The AWS Customer Success Center of Excellence (CS COE) helps customers get tangible value from their AWS investments. We've seen a pattern: customers who build AI strategies that address people, process, and technology together succeed more often. In this post, we share practical considerations that can help close the AI value gap.

#amazon sagemaker ai #artificial intelligence #technical how-to

We're introducing bidirectional streaming for Amazon SageMaker AI Inference, which transforms inference from a transactional exchange into a continuous conversation. This post shows you how to build and deploy a container with bidirectional streaming capability to a SageMaker AI endpoint. We also demonstrate how you can bring your own container or use our partner Deepgram's pre-built models and containers on SageMaker AI to enable bi-directional streaming feature for real-time inference.

« 1...7891011...187»
×