Latest AI News & Updates

#announcements #responsible ai #aws well-architected

Today, we're announcing the AWS Well-Architected Responsible AI Lens—a set of thoughtful questions and corresponding best practices that help builders address responsible AI concerns throughout development and operation.

#amazon bedrock #amazon bedrock agentcore #artificial intelligence

Amazon's AI-powered Amazon Compliance Screening system tackles complex compliance challenges through autonomous agents that analyze, reason through, and resolve cases with precision. This blog post explores how Amazon’s Compliance team built its AI-powered investigation system through a series of AI agents built on AWS.

#ai #dev

OpenAI has introduced GPT‑5.1-Codex-Max, a new frontier agentic coding model now available in its Codex developer environment. The release marks a significant step forward in AI-assisted software engineering, offering improved long-horizon reasoning, efficiency, and real-time interactive capabilities. GPT‑5.1-Codex-Max will now replace GPT‑5.1-Codex as the default model across Codex-integrated surfaces.The new model is designed to serve as a persistent, high-context software development agent, capable of managing complex refactors, debugging workflows, and project-scale tasks across multiple context windows.It comes on the heels of Google releasing its powerful new Gemini 3 Pro model yesterday, yet still outperforms or matches it on key coding benchmarks: On SWE-Bench Verified, GPT‑5.1-Codex-Max achieved 77.9% accuracy at extra-high reasoning effort, edging past Gemini 3 Pro’s 76.2%. It also led on Terminal-Bench 2.0, with 58.1% accuracy versus Gemini’s 54.2%, and matched Gemini’s score of 2,439 on LiveCodeBench Pro, a competitive coding Elo benchmark.When measured against Gemini 3 Pro’s most advanced configuration — its Deep Thinking model — Codex-Max holds a slight edge in agentic coding benchmarks, as well. Performance Benchmarks: Incremental Gains Across Key TasksGPT‑5.1-Codex-Max demonstrates measurable improvements over GPT‑5.1-Codex across a range of standard software engineering benchmarks. On SWE-Lancer IC SWE, it achieved 79.9% accuracy, a significant increase from GPT‑5.1-Codex’s 66.3%. In SWE-Bench Verified (n=500), it reached 77.9% accuracy at extra-high reasoning effort, outperforming GPT‑5.1-Codex’s 73.7%.Performance on Terminal Bench 2.0 (n=89) showed more modest improvements, with GPT‑5.1-Codex-Max achieving 58.1% accuracy compared to 52.8% for GPT‑5.1-Codex. All evaluations were run with compaction and extra-high reasoning effort enabled.These results indicate that the new model offers a higher ceiling on both benchmarked correctness and real-world usability under extended reasoning loads.Technical Architecture: Long-Horizon Reasoning via CompactionA major architectural improvement in GPT‑5.1-Codex-Max is its ability to reason effectively over extended input-output sessions using a mechanism called compaction. This enables the model to retain key contextual information while discarding irrelevant details as it nears its context window limit — effectively allowing for continuous work across millions of tokens without performance degradation.The model has been internally observed to complete tasks lasting more than 24 hours, including multi-step refactors, test-driven iteration, and autonomous debugging.Compaction also improves token efficiency. At medium reasoning effort, GPT‑5.1-Codex-Max used approximately 30% fewer thinking tokens than GPT‑5.1-Codex for comparable or better accuracy, which has implications for both cost and latency.Platform Integration and Use CasesGPT‑5.1-Codex-Max is currently available across multiple Codex-based environments, which refer to OpenAI’s own integrated tools and interfaces built specifically for code-focused AI agents. These include:Codex CLI, OpenAI’s official command-line tool (@openai/codex), where GPT‑5.1-Codex-Max is already live.IDE extensions, likely developed or maintained by OpenAI, though no specific third-party IDE integrations were named.Interactive coding environments, such as those used to demonstrate frontend simulation apps like CartPole or Snell’s Law Explorer.Internal code review tooling, used by OpenAI’s engineering teams.For now, GPT‑5.1-Codex-Max is not yet available via public API, though OpenAI states this is coming soon. Users who wish to work with the model in terminal environments today can do so by installing and using the Codex CLI.It is not currently confirmed whether or how the model will integrate into third-party IDEs unless they are built on top of the CLI or future API.The model is capable of interacting with live tools and simulations. Examples shown in the release include:An interactive CartPole policy gradient simulator, which visualizes reinforcement learning training and activations.A Snell’s Law optics explorer, supporting dynamic ray tracing across refractive indices.These interfaces exemplify the model’s ability to reason in real time while maintaining an interactive development session — effectively bridging computation, visualization, and implementation within a single loop.Cybersecurity and Safety ConstraintsWhile GPT‑5.1-Codex-Max does not meet OpenAI’s “High” capability threshold for cybersecurity under its Preparedness Framework, it is currently the most capable cybersecurity model OpenAI has deployed. It supports use cases such as automated vulnerability detection and remediation, but with strict sandboxing and disabled network access by default.OpenAI reports no increase in scaled malicious use but has introduced enhanced monitoring systems, including activity routing and disruption mechanisms for suspicious behavior. Codex remains isolated to a local workspace unless developers opt-in to broader access, mitigating risks like prompt injection from untrusted content.Deployment Context and Developer UsageGPT‑5.1-Codex-Max is currently available to users on ChatGPT Plus, Pro, Business, Edu, and Enterprise plans. It will also become the new default in Codex-based environments, replacing GPT‑5.1-Codex, which was a more general-purpose model.OpenAI states that 95% of its internal engineers use Codex weekly, and since adoption, these engineers have shipped ~70% more pull requests on average — highlighting the tool’s impact on internal development velocity.Despite its autonomy and persistence, OpenAI stresses that Codex-Max should be treated as a coding assistant, not a replacement for human review. The model produces terminal logs, test citations, and tool call outputs to support transparency in generated code.OutlookGPT‑5.1-Codex-Max represents a significant evolution in OpenAI’s strategy toward agentic development tools, offering greater reasoning depth, token efficiency, and interactive capabilities across software engineering tasks. By extending its context management and compaction strategies, the model is positioned to handle tasks at the scale of full repositories, rather than individual files or snippets.With continued emphasis on agentic workflows, secure sandboxes, and real-world evaluation metrics, Codex-Max sets the stage for the next generation of AI-assisted programming environments — while underscoring the importance of oversight in increasingly autonomous systems.

#business #business / artificial intelligence

In a closed-door workshop led by Anthropic and Stanford, leading AI startups and researchers discussed guidelines for chatbot companions, especially for younger users.

#business #business / artificial intelligence

Depending on foreign-made open models is both a supply chain risk and an innovation problem, experts say.

In today’s data-saturated world, simply being “data-driven” isn’t enough. The most successful organizations are those that translate data, analytics, and AI into measurable business outcomes—creating real value for customers and shareholders alike.

#ai

Fetch AI, a startup founded and led by former DeepMind founding investor, Humayun Sheikh, on Wednesday announced the release of three interconnected products designed to provide the trust, coordination, and interoperability needed for large-scale AI agent ecosystems. The launch includes ASI:One, a personal-AI orchestration platform; Fetch Business, a verification and discovery portal for brand agents; and Agentverse, an open directory hosting more than two million agents. Together, the system positions Fetch as an infrastructure provider for what it calls the “Agentic Web”—a layer where consumer AIs and brand AIs collaborate to complete tasks instead of merely suggesting them.The company says the tools address a central limitation in current consumer AI: models can provide recommendations but cannot reliably execute multi-step actions that require coordination across businesses. Fetch’s approach centers on enabling agents from different organizations to interoperate securely, using verified identities and shared context to complete end-to-end workflows.“We’re creating the same foundation for agents that Google created for websites,” said Humayun Sheikh, Founder and CEO of Fetch AI, and an early investor in DeepMind, in a press release provided to VentureBeat. “Instead of just finding information, your personal AI coordinates with verified brand agents to get things done.”Fetch’s founding and DeepMind connection Fetch AI was founded in 2017 by Humayun Sheikh, an entrepreneur whose early investment in DeepMind helped support the company’s commercial development before its acquisition by Google. “I was one of the first five people at DeepMind and its first investor. My check was the first one in,” Sheikh said, reflecting on the period when advanced machine learning research was still largely inaccessible outside major technology companies.His early experience helped shape Fetch’s direction. “Even in 2013, it was clear to me that agentic systems were going to be the ones that worked. That’s where I focused—on the agentic web,” Sheikh noted. Fetch built on this thesis by developing infrastructure for autonomous software agents, focusing on verifiable identity, secure data exchange, and multi-agent coordination. Over the past several years, the company has expanded to a 70-person team across Cambridge and Menlo Park, raised approximately $60 million, and accumulated more than one million users interacting with its model—data that informed the design of the newly launched products.Sheikh added that his decision to bootstrap the company initially came directly from the proceeds of the DeepMind exit, noting in the interview that while the sale to Google was “a good exit,” he believed the team could have held out for a higher valuation. The early self-funding period allowed Fetch to begin work in 2015—well before transformer architectures went mainstream—on the hypothesis that agentic infrastructure would become foundational to applied AI.ASI:One is a platform for multi-agent orchestrationAt the core of the launch is ASI:One, a language model interface designed specifically for coordinating multiple agents rather than addressing isolated queries. Fetch describes it as an “intelligence layer” that handles context sharing, task routing, and preference modeling.The system stores user-level signals such as favored airlines, dietary constraints, budget ranges, loyalty program identifiers, and calendar availability. When a user requests a complex task — such as planning a trip with flights, hotels, and restaurant reservations — ASI:One retrieves those preferences and delegates work to the appropriate verified agents. The agents then return actionable outputs, including inventory and booking options, rather than generic recommendations.In practice, ASI:One functions as a workflow generator across organizational boundaries. By contrast with conventional LLM applications, which often rely on APIs or RAG techniques to surface information, ASI:One is built to coordinate autonomous agents that can complete transactions. Fetch notes that personalization improves over time as the model accumulates structured preference data.Sheikh emphasized the distinction between orchestrated execution and traditional AI output. “This isn’t searching for options separately and hoping they work together,” he said. “It’s orchestration.” He added that Fetch’s architecture is intentionally modular: “Our architecture is a mix of agentic and expert models. One large model isn’t enough — you need specialists. That’s why we built ASI1, tuned specifically for agentic systems.”The interview also revealed new details about ASI:One’s personalization systems: the platform uses multiple user-owned knowledge graphs to store preferences, travel history, social connections, and contextual constraints. These knowledge graphs are siloed per user and not co-mingled with any Fetch-operated data. Sheikh described this as a “deterministic backbone” that gives the personal AI a stable memory layer beyond the probabilistic output of a single large model.ASI:One launches in Beta today, with a broader release planned for early 2026. Fetch also offers ASI:One Mobile, released earlier this year, giving users access to the same agent-orchestration capabilities on iOS and Android. The mobile app connects directly to Agentverse and the user’s knowledge graphs, enabling on-the-go task execution and real-time interaction with registered agents.Fetch Business offers verified identity and brand controlTo enable reliable coordination between consumers and companies, Fetch is introducing a verification and discovery portal called Fetch Business. The platform allows organizations to verify their identity and claim an official Brand Agent handle — for example, @Hilton or @Nike — regardless of which tools they use to build the underlying agent.Fetch positions the product as an analogue to ICANN domain registration and SSL certificate systems for websites. Verified status is intended to protect consumers from interacting with counterfeit or untrusted agents, a problem the company describes as a major barrier to widespread agent adoption.The system includes low-code tools for small businesses to create agents in a few steps and connect real-time APIs such as inventory, booking systems, or CRM platforms. “With Fetch, you can create an agent in one minute. It gets a handle, like a Twitter username, and you can personalize it completely—even give it your social media permissions to post on your behalf,” Sheikh said. Once a brand claims its namespace, its agent becomes discoverable to consumer AIs and other agents inside Agentverse.The company has pre-reserved thousands of brand namespaces in anticipation of demand. Verification status persists across any platform that integrates with Agentverse, creating a portable identity layer for business agents.The interview highlighted that Fetch Business inherits web-trust primitives directly: domain owners verify their identity by inserting a short code snippet into their existing website backend, allowing the system to pass a cryptographic challenge and grant the agent an authenticity badge similar to a “blue check” for agent identities. Sheikh framed this as “reusing the trust layer the web already spent decades building.”Companies can begin claiming agents now at business.fetch.ai.Agentverse is an open directory of more yhan 2 million agentsThe final component of the release is Agentverse, an open directory and cloud platform that hosts agents and enables cross-ecosystem discoverability. Fetch states that millions of agents have already registered, spanning travel, retail, entertainment, food service, and enterprise categories.Agentverse provides metadata, capability descriptions, and routing logic that ASI:One uses to identify appropriate agents for specific tasks. It also supports secure communication and data exchange between agents. The company notes that the directory is platform-agnostic: agents built with any framework can join and interoperate.According to Sheikh, the lack of a discovery layer is one reason most AI agents see little or no usage. “Ninety percent of AI agents never get used because there’s no discovery layer,” he said. He framed the role of Agentverse in more technical terms: “Right now, if you build an agent, there’s no universal way for others to discover it. That’s what AgentVerse solves—it’s like DNS for agents.” He also described the system as an essential component of the emerging agent economy: “Fetch is building the Google of agents. Just like websites needed search, agents need discovery, trust, and interaction—Fetch provides all of that.”The interview further underscored that Agentverse is cloud-agnostic by design. Sheikh contrasted this with competing agent ecosystems tied to specific cloud providers, arguing that a universal registry is only viable if independent of proprietary cloud environments. He said the open architecture enables an LLM to query any agent “within one minute of deployment,” turning agent publication into a near-instantaneous process similar to registering a domain.Agentverse also integrates payment pathways, enabling agents to execute purchases using partners such as Visa, Skyfire, and supported stablecoins. Consumers can configure spending limits or require explicit approval for transactions.Industry context and implicationsFetch’s launch comes at a time when consumer AI platforms are exploring the shift from static chat interfaces toward autonomous agents capable of completing actions. However, most agent systems remain limited by siloed architectures, limited interoperability, and weak verification standards.Fetch positions its infrastructure as a response to these limitations by providing a cross-platform coordination layer, identity system, and directory service. The company argues that an agent ecosystem requires consistent verification mechanisms to ensure that consumers interact with authentic brand representatives rather than imitations. By establishing namespace control and portable trust indicators, Fetch Business aims to fill a gap similar to early web domain verification.At the same time, ASI:One attempts to centralize user preference data in a way that enables more efficient personalization and multi-agent coordination. This approach differs from generalist LLM applications, which often lack persistent preference architectures or direct access to brand-controlled agents.The interview also made clear that micropayments and digital transaction infrastructure are central to Fetch’s long-term vision. Sheikh referenced integrations with protocols such as Coinbase’s 402 and AP2, positioning these capabilities as essential for autonomous agents to complete end-to-end tasks that include financial execution.Fetch’s combined release of ASI:One, Fetch Business, and Agentverse introduces an interconnected stack designed to support large-scale deployment and usage of AI agents. The company frames the system as foundational infrastructure for an agentic ecosystem, where consumer AIs can coordinate with verified brand agents to complete tasks reliably and securely. The additions to its identity, discovery, and orchestration layers reflect Fetch’s long-standing thesis — rooted partly in lessons from DeepMind’s early development — that intelligence becomes meaningful only when paired with the capacity to act.

#agentic ai #ai agent #information retrieval #llm #rag

Learn how to utilize AI agents to find information in your document corpus
The post How to Perform Agentic Information Retrieval appeared first on Towards Data Science.

#ai & ml #research

One of the principles in our upcoming book Architecture as Code is the ability for architects to design automated governance checks for important architectural concerns, creating fast feedback loops when things go awry. This idea isn’t new—Neal and his coauthors Rebecca Parsons and Patrick Kua espoused this idea back in 2017 in the first edition […]

#business #business / artificial intelligence #business / startups

Sunday Robotics has a new way to train robots to do common household tasks. The startup plans to put its fully autonomous robots in homes next year.

#business #business / artificial intelligence

In this episode of Uncanny Valley, we get to know Palantir CEO Alex Karp and dive into what his answers in a recent WIRED interview reveal about the larger beliefs driving the tech industry today.

#research #brain and cognitive sciences #neuroscience #artificial intelligence #machine learning #computer science and technology #mcgovern institute #school of science

MIT neuroscientists find a surprising parallel in the ways humans and new AI models solve complex problems.

#amazon bedrock #amazon nova #artificial intelligence #best practices #partner solutions #technical how-to

In this post, we cover how you can use tools from Snowflake AI Data Cloud and Amazon Web Services (AWS) to build generative AI solutions that organizations can use to make data-driven decisions, increase operational efficiency, and ultimately gain a competitive edge.

#artificial intelligence #ai ethics #deep dives #generative ai #sexuality #social science

How we learn is changing with generative AI — what does that mean for sex education, consent, and responsibility?
The post Developing Human Sexuality in the Age of AI appeared first on Towards Data Science.

#advanced (300) #amazon sagemaker ai #artificial intelligence #best practices

In this post you will learn how to use Spectrum to optimize resource use and shorten training times without sacrificing quality, as well as how to implement Spectrum fine-tuning with Amazon SageMaker AI training jobs. We will also discuss the tradeoff between QLoRA and Spectrum fine-tuning, showing that while QLoRA is more resource efficient, Spectrum results in higher performance overall.

Zapier Automations connect your favorite tools and services so that routine tasks don't eat into your day.

#ai

A new artificial intelligence startup founded by the creators of the world's most widely used computer vision library has emerged from stealth with technology that generates realistic human-centric videos up to five minutes long — a dramatic leap beyond the capabilities of rivals including OpenAI's Sora and Google's Veo.CraftStory, which launched Tuesday with $2 million in funding, is introducing Model 2.0, a video generation system that addresses one of the most significant limitations plaguing the nascent AI video industry: duration. While OpenAI's Sora 2 tops out at 25 seconds and most competing models generate clips of 10 seconds or less, CraftStory's system can produce continuous, coherent video performances that run as long as a typical YouTube tutorial or product demonstration.The breakthrough could unlock substantial commercial value for enterprises struggling to scale video production for training, marketing, and customer education — markets where brief AI-generated clips have proven inadequate despite their visual polish."If you really try to create a video with one of these video generation systems, you find that a lot of the times you want to implement a certain creative vision, and regardless of how detailed the instructions are, the systems basically ignore a part of your instructions," said Victor Erukhimov, CraftStory's founder and CEO, in an exclusive interview with VentureBeat. "We developed a system that can generate videos basically as long as you need them."How parallel processing solves the long-form video problemCraftStory's advance rests on what the company describes as a parallelized diffusion architecture — a fundamentally different approach to how AI models generate video compared to the sequential methods employed by most competitors.Traditional video generation models work by running diffusion algorithms on increasingly large three-dimensional volumes where time represents the third axis. To generate a longer video, these models require proportionally larger networks, more training data, and significantly more computational resources.CraftStory instead runs multiple smaller diffusion algorithms simultaneously across the entire duration of the video, with bidirectional constraints connecting them. "The latter part of the video can influence the former part of the video too," Erukhimov explained. "And this is pretty important, because if you do it one by one, then an artifact that appears in the first part propagates to the second one, and then it accumulates."Rather than generating eight seconds and then stitching on additional segments, CraftStory's system processes all five minutes concurrently through interconnected diffusion processes.Crucially, CraftStory trained its model on proprietary footage rather than relying solely on internet-scraped videos. The company hired studios to shoot actors using high-frame-rate camera systems that capture crisp detail even in fast-moving elements like fingers — avoiding the motion blur inherent in standard 30-frames-per-second YouTube clips."What we showed is that you don't need a lot of data and you don't need a lot of training budget to create high quality videos," Erukhimov said. "You just need high quality data."Model 2.0 currently operates as a video-to-video system: users upload a still image to animate and a "driving video" containing a person whose movements the AI will replicate. CraftStory provides preset driving videos shot with professional actors, who receive revenue shares when their motion data is used, or users can upload their own footage.The system generates 30-second clips at low resolution in approximately 15 minutes. An advanced lip-sync system synchronizes mouth movements to scripts or audio tracks, while gesture alignment algorithms ensure body language matches speech rhythm and emotional tone.Fighting a war chest battle with $2 million against billionsCraftStory's funding comes almost entirely from Andrew Filev, who sold his project management software company Wrike to Citrix for $2.25 billion in 2021 and now runs Zencoder, an AI coding company. The modest raise stands in stark contrast to the billions flowing into competing efforts — OpenAI has raised over $6 billion in its latest funding round alone.Erukhimov pushed back on the notion that massive capital is prerequisite for success. "I don't necessarily buy the thesis that compute is the path to success," he said. "It definitely helps if you have compute. But if you raise a billion dollars on a PowerPoint, in the end, no one is happy, neither the founders nor the investors."Filev defended the David-versus-Goliath approach. "When you invest in startups, you're fundamentally betting on people," he said in an interview with VentureBeat. "To paraphrase Margaret Mead: never underestimate what a small group of thoughtful, committed engineers and scientists can build."He argued that CraftStory benefits from a focused strategy. "The big labs are in an arms race to build general-purpose video foundation models," Filev said. "CraftStory is riding that wave and going very deep into a specific format: long-form, engaging, human-centric video."Why computer vision expertise matters in generative AI videoErukhimov's credibility stems from his deep roots in computer vision rather than the transformer architectures that have dominated recent AI advances. He was an early contributor to OpenCV — the Open Source Computer Vision Library that has become the de facto standard for computer vision applications, with over 84,000 stars on GitHub.When Intel reduced its support for OpenCV in the mid-2000s, Erukhimov co-founded Itseez with the explicit goal of maintaining and advancing the library. The company expanded OpenCV significantly and pivoted toward automotive safety systems before Intel acquired it in 2016.Filev said this background is precisely what makes Erukhimov well-positioned for video generation. "What people sometimes miss is that generative AI video isn't just about the generative part. It's about understanding motion, facial dynamics, temporal coherence, and how humans actually move," Filev said. "Victor has spent his career mastering exactly those problems."Enterprise focus targets training videos and product demosWhile much of the public excitement around AI video generation has centered on creative tools for consumers, CraftStory is pursuing a decidedly enterprise-focused strategy."We are definitely thinking about B2B more than consumer," Erukhimov said. "We're thinking about companies, specifically software companies, being able to make cool training videos and product videos and launch videos."The logic is straightforward: corporate training, product tutorials, and customer education videos often run several minutes and require consistent quality throughout. A 10-second AI clip cannot effectively demonstrate how to use enterprise software or explain a complex product feature."If you need a longer-form video, then you should go with us," Erukhimov said. "We can create up to five minutes, consistent video, high quality."Filev echoed this assessment. "One huge gap in this market is the lack of models that can generate consistent videos over longer sequences — and that's extremely important for real-world use," he said. "If you're creating a commercial for your company, a 10-second video, no matter how good it looks, just isn't enough. You need 30 seconds, you need two minutes — you need more."The company anticipates cost savings for customers. Filev suggested that "a small business owner could create content in minutes that previously would have cost $20,000 and taken two months to produce."CraftStory is also courting creative agencies that produce video content for corporate clients, with the value proposition centered on cost and speed: agencies can record an actor on camera and transform that footage into a finished AI video, rather than managing expensive multi-day shoots.The next major development on CraftStory's roadmap is a text-to-video model that would allow users to generate long-form content directly from scripts. The team is also developing support for moving-camera scenarios, including the popular "walk-and-talk" format common in high-end advertising.Where CraftStory fits in a fragmented competitive landscapeCraftStory enters a crowded and rapidly evolving market. OpenAI's Sora 2, while not yet publicly available, has generated significant buzz. Google's Veo models are advancing quickly. Runway, Pika, and Stability AI all offer video generation tools with different capabilities.Erukhimov acknowledged the competitive pressure but emphasized that CraftStory serves a distinct niche focused on human-centric videos. He positioned rapid innovation and market capture as the company's primary strategy rather than relying on technical moats.Filev sees the market fragmenting into distinct layers, with large tech companies serving as "API providers of powerful, general-purpose generation models" while specialized players like CraftStory focus on specific use cases. "If the big players are building the engines, CraftStory is building the production studio and assembly line on top," he said.Model 2.0 is available now at app.craftstory.com/model-2.0, with the company offering early access to users and enterprises interested in testing the technology. Whether a lightly-funded startup can capture meaningful market share against deep-pocketed incumbents remains uncertain, but Erukhimov is characteristically confident about the opportunity ahead."AI-generated video will soon become the primary way companies communicate their stories," he said.

#deep learning #data science #deep dives #linear regression #machine learning #pytorch

Hands-on PyTorch: Building a 3-layer neural network for multiple regression
The post PyTorch Tutorial for Beginners: Build a Multiple Regression Model from Scratch appeared first on Towards Data Science.

Writing readable Python functions doesn’t have to be difficult. This guide shows simple beginner-friendly techniques to make your code clear, consistent, and easy for others to understand.

#artificial intelligence #ai product management #ai strategy #decision making #probabilistic thinking #valuation

Practical guidance on identifying opportunities, managing product portfolios, and overcoming behavioral biases
The post Making Smarter Bets: Towards a Winning AI Strategy with Probabilistic Thinking appeared first on Towards Data Science.

#culture #culture / digital culture

The Relay app allows users to track their porn-free streaks and get group support. Its creators say they’re taking a stand against porn and AI erotica.

  Decision tree-based models for predictive machine learning tasks like classification and regression are undoubtedly rich in advantages — such as their ability to capture nonlinear relationships among features and their intuitive interpretability that makes it easy to trace decisions.

#ai & ml #commentary

The following originally appears on fast.ai and is reposted here with the author’s permission. I’ve spent decades teaching people to code, building tools that help developers work more effectively, and championing the idea that programming should be accessible to everyone. Through fast.ai, I’ve helped millions learn not just to use AI but to understand it […]

#artificial intelligence #app #summary #why it matters

A group of quantum physicists claims to have created a version of the powerful reasoning AI model DeepSeek R1 that strips out the censorship built into the original by its Chinese creators.  The scientists at Multiverse Computing, a Spanish firm specializing in quantum-inspired AI techniques, created DeepSeek R1 Slim, a model that is 55% smaller…

#research #ai

From AI education to disaster response, see how collaboration is at the heart of the work at Google Research.

#ai

Premiering November 19VentureBeat is proud to announce the launch of its new flagship podcast, Beyond the Pilot: Enterprise AI in Action, premiering November 19 and brought to you by our anchor sponsor, Outshift by Cisco.Enterprise AI has reached a new inflection point: workloads are going live, and the constraints are getting real. The challenge for enterprise technical leaders isn’t understanding AI’s potential — it’s navigating the messy, complex work of making it run reliably at scale. Beyond the Pilot goes inside that reality with candid conversations from executives who’ve moved past experiments and into production — scaling AI and agentic systems that deliver measurable business value.“Enterprise technical leaders keep telling us the same thing: the hype cycle is loud, but what they really need are credible stories about what works in production,” said Matt Marshall, Founder and Editor-in-Chief of VentureBeat. “With Beyond the Pilot, we’re doubling down on VentureBeat’s mission to serve the practitioners — the people actually responsible for delivering AI outcomes. We’re excited to bring a new level of depth, honesty, and technical insight to this conversation.”Through in-depth, technically rigorous conversations, Beyond the Pilot unpacks the decisions, infrastructure, and organizational changes behind real-world AI deployments. The result: a practical look at what it takes to turn ambition into action.Season 1: Real stories from the frontlines of enterprise AIThe debut episode features Notion, where VP AI Ryan Nystrom discusses how the company built Notion 3.0 for agents — and what it means to create an AI-native product inside a platform used by millions.Upcoming episodes will feature leaders from LinkedIn, Booking.com, JPMorgan, Mastercard, and LexisNexis, each sharing the inside story of scaling production AI systems inside complex global enterprises.A podcast for enterprise AI executorsBeyond the Pilot is designed for senior managers, directors, VPs, and lead engineers responsible for turning AI strategy into tangible results. These are the decision-makers navigating the hardest problems in enterprise AI — model governance, infrastructure choices, security constraints, scaling challenges, and real ROI."At Outshift, we take emerging technologies from experimentation to production readiness," said Vijoy Pandey, SVP/GM of emerging tech incubator Outshift by Cisco. "That's why we're supporting VentureBeat on Beyond the Pilot. The podcast features tech leaders sharing how they built and deployed AI and agentic systems inside complex enterprises — the real decisions, the tradeoffs, what actually worked. It's the conversation the industry needs right now." Hosted by Matt Marshall and Sam Witteveen, the show delivers authentic, technically grounded conversations. No hype. No hand-waving. Just actionable lessons from those who’ve done it.Subscribe and listenThe first episode of Beyond the Pilot drops November 19 on VentureBeat.com and your favorite podcast platforms.Subscribe now to hear how top enterprises are turning AI ambition into action.Subscribe: Apple Podcasts | Spotify | YouTube

Gemini 3 released, Grok 4.1, AI fights superbugs, tech titans + Claude, and more...

#3-d #software #design #artificial intelligence #automation #computer modeling #machine learning #mechanical engineering #research #school of engineering

The virtual VideoCAD tool could boost designers’ productivity and help train engineers learning computer-aided design.

#ai

Researchers at Meta, the University of Chicago, and UC Berkeley have developed a new framework that addresses the high costs, infrastructure complexity, and unreliable feedback associated with using reinforcement learning (RL) to train large language model (LLM) agents. The framework, DreamGym, simulates an RL environment to train agents for complex applications. As it progresses through the training process, the framework dynamically adjusts task difficulty, ensuring the agent gradually learns to solve more challenging problems as it improves.Experiments by the research team show that DreamGym substantially improves RL training in both fully synthetic settings and scenarios where the model must apply its simulated learning to the real world. In settings where RL is possible but expensive, it matches the performance of popular algorithms using only synthetic interactions, significantly cutting the costs of data gathering and environment interaction. This approach could be vital for enterprises, allowing them to train agents for bespoke applications while avoiding the complexities of setting up and running live RL environments.The challenge of training LLM agentsReinforcement learning is a key technique for training LLMs to handle complex tasks in agentic environments, such as web navigation, tool use, and robotics. It allows models to learn from direct interaction and experience, moving beyond the static datasets used in pre-training.However, RL for agent training remains difficult. Real-world applications often involve long action sequences with sparse signals, meaning the agent only receives a positive signal after a long and correct sequence of actions. Gathering enough diverse and validated data is also expensive, frequently requiring human experts to verify tasks and annotate outcomes. And the infrastructure required to create the live environments for large-scale RL training can be prohibitively complex and costly. Not to mention that interacting with live systems carries risks, as wrong actions (like deleting a file) can cause irreparable damage. “These limitations make building general-purpose and scalable systems for training agents with RL an open and pressing challenge,” the researchers write.DreamGym directly challenges that model by delivering comparable performance entirely in simulation, removing the infrastructure burden that has kept most enterprises from adopting RL — and giving teams a practical path to train agents without touching costly or risky live environments.How DreamGym worksThe researchers describe DreamGym as a “unified and scalable RL framework that synthesizes diverse experience data in an online manner to enable efficient and effective training of LLM agents.” It is built around three core components that work together to create a controlled and effective training loop.The first component is a “reasoning-based experience model” that translates the dynamics of a target environment into a textual space. This model acts as the simulator of the application environment. Instead of interacting with a costly real environment, the agent interacts with this model, which generates consistent state transitions and feedback based on the agent’s actions. The researchers argue that agent training doesn't need perfectly realistic environments, but rather data that is "sufficiently diverse, informative, and causally grounded." For example, in a web shopping task, the model synthesizes clean listings of on-page elements rather than processing raw HTML code. This abstract approach makes training the experience model highly efficient, requiring only a small amount of public data.The second component is an “experience replay buffer,” which acts as a dynamic memory. At the beginning of the training process, the buffer is seeded with offline data to provide essential context and is continuously updated with new synthetic trajectories generated during training. This buffer helps guide the experience model's predictions, ensuring the synthetic experiences remain diverse and factually grounded. The third component, a “curriculum task generator,” works in tandem with the experience model to adaptively create new tasks that are progressively more challenging. It identifies tasks where the agent's performance is mixed (signaling they are difficult but solvable) and generates variations to push the agent's capabilities.Together, these components create a closed-loop system for scalable agent training. “By unifying interaction, memory, and adaptive online task generation, DreamGym addresses the persistent challenges that have limited RL for LLM agents training: prohibitive cost, scarcity of diverse tasks, unstable reward signals, and heavy infrastructure demands,” according to the researchers.DreamGym in actionThe researchers evaluated DreamGym across several agent benchmarks, including WebShop (e-commerce), ALFWorld (embodied control), and WebArena (realistic web interaction). They used Llama 3 and Qwen 2.5 models as agent backbones and compared DreamGym against several traditional training strategies. These included offline methods like supervised fine-tuning (SFT) and direct preference optimization (DPO), as well as online RL algorithms like Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), which improve agents through live environment interaction.DreamGym showed its most significant advantage in environments like WebArena, where setting up a large-scale RL infrastructure is difficult. Agents trained entirely inside DreamGym achieved success rates over 30% higher than baseline methods, which struggled with the sparse rewards and limited exploration in the real environment. The researchers said this shows DreamGym is a mechanism that makes RL training “feasible in domains that were previously intractable due to inherent task and engineering constraints.”In environments where RL is supported but costly, agents trained with DreamGym performed on par with those trained using GRPO and PPO, but without any costly interactions with the external environment. The team also introduced a sim-to-real approach, DreamGym-S2R, where an agent is first trained in the synthetic environment and then fine-tuned on a small amount of real-world data. This strategy yielded over a 40% performance improvement compared to training from scratch in the real environment while using less than 10% of the external data. This provides a scalable "warm-start" for training general-purpose agents.Finally, the framework demonstrated strong generalization. An agent trained on tasks in one domain, such as WebShop, could successfully transfer its learned skills to another, like WebArena. The researchers suggest this is because DreamGym agents learn in an "abstract meta-representation space, enabling the agent to learn domain-agnostic behavioral priors rather than memorizing task-specific patterns."While still in its early stages, DreamGym shows that simulated environments can provide great gains in training agents. In practice, an enterprise could gather a small amount of trajectories and descriptions for the tasks it wants to automate. It can then use this small seed to bootstrap the DreamGym frameworks for the scalable and sample-efficient training of agents.

#amazon bedrock #amazon bedrock agentcore #amazon machine learning #amazon sagemaker ai #artificial intelligence #technical how-to #aws iot core

RoboTic-Tac-Toe is an interactive game where two physical robots move around a tic-tac-toe board, with both the gameplay and robots’ movements orchestrated by LLMs. Players can control the robots using natural language commands, directing them to place their markers on the game board. In this post, we explore the architecture and prompt engineering techniques used to reason about a tic-tac-toe game and decide the next best game strategy and movement plan for the current player.

« 1...1516171819...191»
×