AI Studio has released new updates that give developers even more control.
Vector databases (DBs), once specialist research instruments, have become widely used infrastructure in just a few years. They power today's semantic search, recommendation engines, anti-fraud measures and gen AI applications across industries. There are a deluge of options: PostgreSQL with pgvector, MySQL HeatWave, DuckDB VSS, SQLite VSS, Pinecone, Weaviate, Milvus and several others.The riches of choices sound like a boon to companies. But just beneath, a growing problem looms: Stack instability. New vector DBs appear each quarter, with disparate APIs, indexing schemes and performance trade-offs. Today's ideal choice may look dated or limiting tomorrow.To business AI teams, volatility translates into lock-in risks and migration hell. Most projects begin life with lightweight engines like DuckDB or SQLite for prototyping, then move to Postgres, MySQL or a cloud-native service in production. Each switch involves rewriting queries, reshaping pipelines, and slowing down deployments.This re-engineering merry-go-round undermines the very speed and agility that AI adoption is supposed to bring.Why portability matters nowCompanies have a tricky balancing act:Experiment quickly with minimal overhead, in hopes of trying and getting early value; Scale safely on stable, production-quality infrastructure without months of refactoring;Be nimble in a world where new and better backends arrive nearly every month. Without portability, organizations stagnate. They have technical debt from recursive code paths, are hesitant to adopt new technology and cannot move prototypes to production at pace. In effect, the database is a bottleneck rather than an accelerator.Portability, or the ability to move underlying infrastructure without re-encoding the application, is ever more a strategic requirement for enterprises rolling out AI at scale.Abstraction as infrastructureThe solution is not to pick the "perfect" vector database (there isn't one), but to change how enterprises think about the problem.In software engineering, the adapter pattern provides a stable interface while hiding underlying complexity. Historically, we've seen how this principle reshaped entire industries:ODBC/JDBC gave enterprises a single way to query relational databases, reducing the risk of being tied to Oracle, MySQL or SQL Server; Apache Arrow standardized columnar data formats, so data systems could play nice together; ONNX created a vendor-agnostic format for machine learning (ML) models, bringing TensorFlow, PyTorch, etc. together; Kubernetes abstracted infrastructure details, so workloads could run the same everywhere on clouds;any-llm (Mozilla AI) now makes it possible to have one API across lots of large language model (LLM) vendors, so playing with AI is safer. All these abstractions led to adoption by lowering switching costs. They turned broken ecosystems into solid, enterprise-level infrastructure.Vector databases are also at the same tipping point.The adapter approach to vectorsInstead of having application code directly bound to some specific vector backend, companies can compile against an abstraction layer that normalizes operations like inserts, queries and filtering.This doesn't necessarily eliminate the need to choose a backend; it makes that choice less rigid. Development teams can start with DuckDB or SQLite in the lab, then scale up to Postgres or MySQL for production and ultimately adopt a special-purpose cloud vector DB without having to re-architect the application.Open source efforts like Vectorwrap are early examples of this approach, presenting a single Python API to Postgres, MySQL, DuckDB and SQLite. They demonstrate the power of abstraction to accelerate prototyping, reduce lock-in risk and support hybrid architectures employing numerous backends.Why businesses should careFor leaders of data infrastructure and decision-makers for AI, abstraction offers three benefits:Speed from prototype to productionTeams are able to prototype on lightweight local environments and scale without expensive rewrites.Reduced vendor riskOrganizations can adopt new backends as they emerge without long migration projects by decoupling app code from specific databases.Hybrid flexibilityCompanies can mix transactional, analytical and specialized vector DBs under one architecture, all behind an aggregated interface.The result is data layer agility, and that's more and more the difference between fast and slow companies.A broader movement in open sourceWhat's happening in the vector space is one example of a bigger trend: Open-source abstractions as critical infrastructure.In data formats: Apache ArrowIn ML models: ONNXIn orchestration: KubernetesIn AI APIs: Any-LLM and other such frameworksThese projects succeed, not by adding new capability, but by removing friction. They enable enterprises to move more quickly, hedge bets and evolve along with the ecosystem.Vector DB adapters continue this legacy, transforming a high-speed, fragmented space into infrastructure that enterprises can truly depend on.The future of vector DB portabilityThe landscape of vector DBs will not converge anytime soon. Instead, the number of options will grow, and every vendor will tune for different use cases, scale, latency, hybrid search, compliance or cloud platform integration.Abstraction becomes strategy in this case. Companies adopting portable approaches will be capable of:Prototyping boldlyDeploying in a flexible mannerScaling rapidly to new techIt's possible we'll eventually see a "JDBC for vectors," a universal standard that codifies queries and operations across backends. Until then, open-source abstractions are laying the groundwork.ConclusionEnterprises adopting AI cannot afford to be slowed by database lock-in. As the vector ecosystem evolves, the winners will be those who treat abstraction as infrastructure, building against portable interfaces rather than binding themselves to any single backend.The decades-long lesson of software engineering is simple: Standards and abstractions lead to adoption. For vector DBs, that revolution has already begun.Mihir Ahuja is an AI/ML engineer and open-source contributor based in San Francisco.
Superintelligence mapped, Claude Skills, AI travel booking, bubble warning, and more...
Google is adding a new feature for third-party developers building atop its Gemini AI models that rivals like OpenAI's ChatGPT, Anthropic's Claude, and the growing array of Chinese open source options are unlikely to get anytime soon: grounding with Google Maps.This addition allows developers to connect Google's Gemini AI models' reasoning capabilities with live geospatial data from Google Maps, enabling applications to deliver detailed, location-relevant responses to user queries—such as business hours, reviews, or the atmosphere of a specific venue. By tapping into data from over 250 million places, developers can now build more intelligent and responsive location-aware experiences.This is particularly useful for applications where proximity, real-time availability, or location-specific personalization matter—such as local search, delivery services, real estate, and travel planning. When the user’s location is known, developers can pass latitude and longitude into the request to enhance the response quality.By tightly integrating real-time and historical Maps data into the Gemini API, Google enables applications to generate grounded, location-specific responses with factual accuracy and contextual depth that are uniquely possible through its mapping infrastructure.Merging AI and Geospatial IntelligenceThe new feature is accessible in Google AI Studio, where developers can try a live demo powered by the Gemini Live API. Models that support the grounding with Google Maps include:Gemini 2.5 ProGemini 2.5 FlashGemini 2.5 Flash-LiteGemini 2.0 FlashIn one demonstration, a user asked for Italian restaurant recommendations in Chicago. The assistant, leveraging Maps data, retrieved top-rated options and clarified a misspelled restaurant name before locating the correct venue with accurate business details.Developers can also retrieve a context token to embed a Google Maps widget in their app’s user interface. This interactive component displays photos, reviews, and other familiar content typically found in Google Maps.Integration is handled via the generateContent method in the Gemini API, where developers include googleMaps as a tool. They can also enable a Maps widget by setting a parameter in the request. The widget, rendered using a returned context token, can provide a visual layer alongside the AI-generated text.Use Cases Across IndustriesThe Maps grounding tool is designed to support a wide range of practical use cases:Itinerary generation: Travel apps can create detailed daily plans with routing, timing, and venue information.Personalized local recommendations: Real estate platforms can highlight listings near kid-friendly amenities like schools and parks.Detailed location queries: Applications can provide specific information, such as whether a cafe offers outdoor seating, using community reviews and Maps metadata.Developers are encouraged to only enable the tool when geographic context is relevant, to optimize both performance and cost. According to the developer documentation, pricing starts at $25 per 1,000 grounded prompts — a steep sum for those trafficking in numerous queries.Combining Search and Maps for Enhanced ContextDevelopers can use Grounding with Google Maps alongside Grounding with Google Search in the same request.While the Maps tool contributes factual data—like addresses, hours, and ratings—the Search tool adds broader context from web content, such as news or event listings.For example, when asked about live music on Beale Street, the combined tools provide venue details from Maps and event times from Search. According to Google, internal testing shows that using both tools together leads to significantly improved response quality.Unfortunately, it doesn't appear that the Google Maps grounding includes live vehicular traffic data — at least not yet.Customization and Developer FlexibilityThe experience is built for customization. Developers can tweak system prompts, choose from different Gemini models, and configure voice settings to tailor interactions. The demo app in Google AI Studio is also remixable, enabling developers to test ideas, add features, and iterate on designs within a flexible development environment.The API returns structured metadata—including source links, place IDs, and citation spans—that developers can use to build inline citations or verify the AI-generated outputs. This supports transparency and enhances trust in user-facing applications. Google also requires that Maps-based sources be attributed clearly and linked back to the source using their URI.Implementation Considerations for AI BuildersFor technical teams integrating this capability, Google recommends:Passing user location context when known, for better results.Displaying Google Maps source links directly beneath the relevant content.Only enabling the tool when the query clearly involves geographic context.Monitoring latency and disabling grounding when performance is critical.Grounding with Google Maps is currently available globally, though prohibited in several territories (including China, Iran, North Korea, and Cuba), and not permitted for emergency response use cases.Availability and AccessGrounding with Google Maps is now generally available through the Gemini API. With this release, Google continues to expand the capabilities of the Gemini API, empowering developers to build AI-driven applications that understand and respond to the world around them.
Exciting news for BigQuery ML (BQML) users.
For many software developers using generative AI, vibe coding is a double-edged sword. The process delivers rapid prototypes but often leaves a trail of brittle, undocumented code that creates significant technical debt. A new open-source platform, Codev, addresses this by proposing a fundamental shift: treating the natural language conversation with an AI as part of the actual source code. Codev is based on SP(IDE)R, a framework designed to turn vibe-coding conversations into structured, versioned, and auditable assets that become part of the code repository.What is Codev?At its core, Codev is a methodology that treats natural language context as an integral part of the development lifecycle as opposed to a disposable artifact as is the case with vanilla vibe coding. According to co-founder Waleed Kadous, the goal is to invert the typical engineering workflow. "A key principle of Codev is that documents like the specification are the actual code of the system," he told VentureBeat. "It's almost like natural language is compiled down into Typescript by our agents."This approach avoids the common pitfall where documentation is created after the fact, if at all.Its flagship protocol, SP(IDE)R, provides a lightweight but formal structure for building software. The process begins with Specify, where a human and multiple AI agents collaborate to turn a high-level request into concrete acceptance criteria. Next, in the Plan stage, an AI proposes a phased implementation, which is again reviewed. For each phase, the AI enters an IDE loop: it Implements the code, Defends it against bugs and regression with comprehensive tests, and Evaluates the result against the specification. The final step is Review, where the team documents lessons learned to update and improve the SP(IDE)R protocol itself for future projects.The framework’s key differentiator is its use of multiple agents and explicit human review at different stages. Kadous notes that each agent brings unique strengths to the review process. "Gemini is extremely good at catching security issues," he said, citing a critical cross-site scripting (XSS) flaw and another bug that "would have shared an OpenAI API key with the client, which could cost thousands of dollars." Meanwhile, "GPT-5 is very good at understanding how to simplify a design." This structured review, with a human providing final approval at each stage, prevents the kind of runaway automation that leads to flawed code.The platform’s AI-native philosophy extends to its installation. There is no complex installer; instead, a user instructs their AI agent to apply the Codev GitHub repository to set up the project. The developers "dogfooded" their framework, using Codev to build Codev.“The key point here is that natural language is executable now, with the agent being the interpreter,” Kadous said. “This is great because it means it's not a ‘blind’ integration of Codev, the agent gets to choose the best way to integrate it and can intelligently make decisions.”Codev case studyTo test the framework's effectiveness, its creators ran a direct comparison between vanilla vibe-coding and Codev. They gave Claude Opus 4.1 a request to build a modern web-based todo manager. The first attempt used a conversational, vibe-coding approach. The result was a plausible-looking demo. However, an automated analysis conducted by three independent AI agents found that it had implemented 0% of the required functionality, contained no tests, and lacked a database or API.The second attempt used the same AI model and prompt but applied the SP(IDE)R protocol. This time, the AI produced a production-ready application with 32 source files, 100% of the specified functionality, five test suites, a SQLite database, and a complete RESTful API. Throughout this process, the human developers reported they never directly edited a single line of source code. While this was a single experiment, Kadous estimates the impact is substantial. "Subjectively, it feels like I'm about three times as productive with Codev as without," he says. The quality also speaks for itself. "I used LLMs as a judge, and one of them described the output like what a well-oiled engineering team would produce. That was exactly what I was aiming for."While the process is powerful, it redefines the developer's role from a hands-on coder to a system architect and reviewer. According to Kadous, the initial spec and plan stages can each take between 45 minutes to two hours of focused collaboration. This is in contrast to the impression given by many vibe-coding platforms, where a single prompt and a few minutes of processing gives you a fully functional and scalable application."All of the value I add is in the background knowledge I apply to the specs and plans," he explains. He emphasizes that the framework is designed to augment, not replace, experienced talent. "The people who will do the best... are senior engineers and above because they know the pitfalls... It just takes the senior engineer you already have and makes them much more productive."A future of human and AI collaborationFrameworks like Codev signal a shift where the primary creative act of software development moves from writing code to crafting precise, machine-readable specifications and plans. For enterprise teams, this means AI-generated code can become auditable, maintainable, and reliable. By capturing the entire development conversation in version control and enforcing it with CI, the process turns ephemeral chats into durable engineering assets.Codev proposes a future where the AI acts not as a chaotic assistant, but as a disciplined collaborator in a structured, human-led workflow. However, Kadous acknowledges this shift creates new challenges for the workforce. "Senior engineers that reject AI outright will be outpaced by senior engineers who embrace it," he predicts. He also expresses concern for junior developers who may not get the chance "to build their architectural chops," a skill that becomes even more critical when guiding AI. This highlights a central challenge for the industry: ensuring that as AI elevates top performers, it also creates pathways to develop the next generation of talent.
Vector databases have become essential in most modern AI applications.
This post shows how TP ICAP used Amazon Bedrock Knowledge Bases and Amazon Bedrock Evaluations to build ClientIQ, an enterprise-grade solution with enhanced security features for extracting CRM insights using AI, delivering immediate business value.
In the post Principal Financial Group increases Voice Virtual Assistant performance using Genesys, Amazon Lex, and Amazon QuickSight, we discussed the overall Principal Virtual Assistant solution using Genesys Cloud, Amazon Lex V2, multiple AWS services, and a custom reporting and analytics solution using Amazon QuickSight.
In this post, we discuss an approach that can guide you to build comprehensive and empirically driven evaluations that can help you make better decisions when selecting the right model for your task.
In this post, we show how Splash Music is setting a new standard for AI-powered music creation by using its advanced HummingLM model with AWS Trainium on Amazon SageMaker HyperPod. As a selected startup in the 2024 AWS Generative AI Accelerator, Splash Music collaborated closely with AWS Startups and the AWS Generative AI Innovation Center (GenAIIC) to fast-track innovation and accelerate their music generation FM development lifecycle.
To reduce waste, the Refashion program helps users create outlines for adaptable clothing, such as pants that can be reconfigured into a dress. Each component of these pieces can be replaced, rearranged, or restyled.
In this era of AI slop, the idea that generative AI tools like Midjourney and Runway could be used to make art can seem absurd: What possible artistic value is there to be found in the likes of Shrimp Jesus and Ballerina Cappuccina? But amid all the muck, there are people using AI tools with…
One of the coolest things about generative AI models — both large language models (LLMs) and diffusion-based image generators — is that they are "non-deterministic." That is, despite their reputation among some critics as being "fancy autocorrect," generative AI models actually generate their outputs by choosing from a distribution of the most probable next tokens (units of information) to fill out their response.Asking an LLM: "What is the capital of France?" will have it sample its probability distribution for France, capitals, cities, etc. to arrive at the answer "Paris." But that answer could come in the format of "The capital of France is Paris," or simply "Paris" or "Paris, though it was Versailles at one point." Still, those of us that use these models frequently day-to-day will note that sometimes, their answers can feel annoyingly repetitive or similar. A common joke about coffee is recycled across generations of queries. Story prompts generate similar arcs. Even tasks that should yield many plausible answers—like naming U.S. states—tend to collapse into only a few. This phenomenon, known as mode collapse, arises during post-training alignment and limits the usefulness of otherwise powerful models.Especially when using LLMs to generate new creative works in writing, communications, strategy, or illustrations, we actually want their outputs to be even more varied than they already are. Now a team of researchers at Northeastern University, Stanford University and West Virginia University have come up with an ingenuously simple method to get language and image models to generate a wider variety of responses to nearly any user prompt by adding a single, simple sentence: "Generate 5 responses with their corresponding probabilities, sampled from the full distribution."The method, called Verbalized Sampling (VS), helps models like GPT-4, Claude, and Gemini produce more diverse and human-like outputs—without retraining or access to internal parameters. It is described in a paper published on the open access journal arxiv.org online in early October 2025.When prompted in this way, the model no longer defaults to its safest, most typical output. Instead, it verbalizes its internal distribution over potential completions and samples across a wider spectrum of possibilities. This one-line change leads to substantial gains in output diversity across multiple domains.As Weiyan Shi, an assistant professor at Northeastern University and co-author of the paper, wrote on X: "LLMs' potentials are not fully unlocked yet! As shown in our paper, prompt optimization can be guided by thinking about how LLMs are trained and aligned, and can be proved theoretically."Why Models Collapse—and How VS Reverses ItAccording to the research team, the root cause of mode collapse lies not just in algorithms like reinforcement learning from human feedback (RLHF), but in the structure of human preferences. People tend to rate more familiar or typical answers as better, which nudges LLMs toward “safe” choices over diverse ones during fine-tuning.However, this bias doesn’t erase the model’s underlying knowledge—it just suppresses it. VS works by bypassing this suppression. Instead of asking for the single most likely output, it invites the model to reveal a set of plausible responses and their relative probabilities. This distribution-level prompting restores access to the richer diversity present in the base pretraining model.Real-World Performance Across TasksThe research team tested Verbalized Sampling across several common use cases:Creative Writing: In story generation, VS increased diversity scores by up to 2.1× compared to standard prompting, while maintaining quality. One story prompt—“Without a goodbye”—produced formulaic breakup scenes under direct prompting, but yielded narratives involving cosmic events, silent emails, and music stopping mid-dance when prompted via VS.Dialogue Simulation: In persuasive dialogue tasks, VS enabled models to simulate human-like patterns, such as hesitation, resistance, and changes of mind. Donation behavior distributions under VS better aligned with real human data compared to baseline methods.Open-ended QA: When asked to enumerate valid answers (e.g., naming U.S. states), models using VS generated responses that more closely matched the diversity of real-world data. They covered a broader set of answers without sacrificing factual accuracy.Synthetic Data Generation: When used to generate math problems for model training, VS created more varied datasets. These, in turn, improved downstream performance in competitive math benchmarks, outperforming synthetic data generated via direct prompting.Tunable Diversity and Better Use of Larger ModelsA notable advantage of VS is its tunability. Users can set a probability threshold in the prompt to sample from lower-probability “tails” of the model’s distribution. Lower thresholds correspond to higher diversity. This tuning can be done via prompt text alone, without changing any decoding settings like temperature or top-p.In one test using the Gemini-2.5-Flash model, diversity in story writing increased steadily as the probability threshold dropped from 1 to 0.001. The chart accompanying the study showed VS outperforming both direct and sequence-based prompting across all thresholds.Interestingly, the method scales well with model size. Larger models like GPT-4.1 and Claude-4 showed even greater gains from VS compared to smaller ones. While smaller models benefitted, the improvement in diversity was roughly 1.5–2× stronger in larger counterparts—suggesting VS helps unlock more of the latent capabilities in advanced models.Deployment and AvailabilityThe Verbalized Sampling method is available now as a Python package:pip install verbalized-samplingThe package includes integration with LangChain and supports a simple interface for sampling from the verbalized distribution. Users can also adjust parameters like k (number of responses), thresholds, and temperature to suit their applications. A live Colab notebook and documentation are available under an enterprise friendly Apache 2.0 license on GitHub at: https://github.com/CHATS-lab/verbalized-samplingPractical Tips and Common IssuesWhile the method works across all major LLMs, some users may initially encounter refusals or errors. In these cases, the authors suggest using the system prompt version of the template or referring to alternative formats listed on the GitHub page. Some models interpret complex instructions as jailbreak attempts and refuse to comply unless the structure is clearer.For example, prompting via a system-level instruction like this improves reliability:You are a helpful assistant. For each query, generate five responses within separate tags, each with a probability below 0.10.This small change typically resolves any issues.A Lightweight Fix for a Big ProblemVerbalized Sampling represents a practical, inference-time fix to a deep limitation in how modern language models behave. It doesn’t require model retraining or internal access. It is not dependent on any one model family. And it improves not only the diversity of outputs, but their quality—as judged by both human evaluation and benchmark scores.With growing interest in tools that enhance model creativity, VS is likely to see rapid adoption in domains like writing, design, simulation, education, and synthetic data generation.For users and developers frustrated by the sameness of LLM responses, the fix may be as simple as changing the question.
Organizations often face challenges when implementing single-shot fine-tuning approaches for their generative AI models. The single-shot fine-tuning method involves selecting training data, configuring hyperparameters, and hoping the results meet expectations without the ability to make incremental adjustments. Single-shot fine-tuning frequently leads to suboptimal results and requires starting the entire process from scratch when improvements are […]
The AI blueprint, new OpenAI features, Hinton warning, building AI apps, and more...
In this post, we'll demonstrate how to implement a Quick Service Restaurants (QSRs) drive-thru solution using Amazon Nova Sonic and AWS services. We'll walk through building an intelligent system that combines voice AI with interactive menu displays, providing technical insights and implementation guidance to help restaurants modernize their drive-thru operations.
This post provides a comprehensive hands-on guide to fine-tune Amazon Nova Lite for document processing tasks, with a focus on tax form data extraction. Using our open-source GitHub repository code sample, we demonstrate the complete workflow from data preparation to model deployment.
In this article, you will learn three proven ways to speed up model training by optimizing precision, memory, and data flow — without adding any...
An overview of DeepSomatic, a new AI tool that helps identify complex genetic variants in cancer cells.
An overview of ten years of milestones and breakthroughs in Google’s work on genomics.
Anthropic launched a new capability on Thursday that allows its Claude AI assistant to tap into specialized expertise on demand, marking the company's latest effort to make artificial intelligence more practical for enterprise workflows as it chases rival OpenAI in the intensifying competition over AI-powered software development.The feature, called Skills, enables users to create folders containing instructions, code scripts, and reference materials that Claude can automatically load when relevant to a task. The system marks a fundamental shift in how organizations can customize AI assistants, moving beyond one-off prompts to reusable packages of domain expertise that work consistently across an entire company."Skills are based on our belief and vision that as model intelligence continues to improve, we'll continue moving towards general-purpose agents that often have access to their own filesystem and computing environment," said Mahesh Murag, a member of Anthropic's technical staff, in an exclusive interview with VentureBeat. "The agent is initially made aware only of the names and descriptions of each available skill and can choose to load more information about a particular skill when relevant to the task at hand."The launch comes as Anthropic, valued at $183 billion after a recent $13 billion funding round, projects its annual revenue could nearly triple to as much as $26 billion in 2026, according to a recent Reuters report. The company is currently approaching a $7 billion annual revenue run rate, up from $5 billion in August, fueled largely by enterprise adoption of its AI coding tools — a market where it faces fierce competition from OpenAI's recently upgraded Codex platform.How 'progressive disclosure' solves the context window problemSkills differ fundamentally from existing approaches to customizing AI assistants, such as prompt engineering or retrieval-augmented generation (RAG), Murag explained. The architecture relies on what Anthropic calls "progressive disclosure" — Claude initially sees only skill names and brief descriptions, then autonomously decides which skills to load based on the task at hand, accessing only the specific files and information needed at that moment."Unlike RAG, this relies on simple tools that let Claude manage and read files from a filesystem," Murag told VentureBeat. "Skills can contain an unbounded amount of context to teach Claude how to complete a task or series of tasks. This is because Skills are based on the premise of an agent being able to autonomously and intelligently navigate a filesystem and execute code."This approach allows organizations to bundle far more information than traditional context windows permit, while maintaining the speed and efficiency that enterprise users demand. A single skill can include step-by-step procedures, code templates, reference documents, brand guidelines, compliance checklists, and executable scripts — all organized in a folder structure that Claude navigates intelligently.The system's composability provides another technical advantage. Multiple skills automatically stack together when needed for complex workflows. For instance, Claude might simultaneously invoke a company's brand guidelines skill, a financial reporting skill, and a presentation formatting skill to generate a quarterly investor deck — coordinating between all three without manual intervention.What makes Skills different from OpenAI's Custom GPTs and Microsoft's CopilotAnthropic is positioning Skills as distinct from competing offerings like OpenAI's Custom GPTs and Microsoft's Copilot Studio, though the features address similar enterprise needs around AI customization and consistency."Skills' combination of progressive disclosure, composability, and executable code bundling is unique in the market," Murag said. "While other platforms require developers to build custom scaffolding, Skills let anyone — technical or not — create specialized agents by organizing procedural knowledge into files."The cross-platform portability also sets Skills apart. The same skill works identically across Claude.ai, Claude Code (Anthropic's AI coding environment), the company's API, and the Claude Agent SDK for building custom AI agents. Organizations can develop a skill once and deploy it everywhere their teams use Claude, a significant advantage for enterprises seeking consistency.The feature supports any programming language compatible with the underlying container environment, and Anthropic provides sandboxing for security — though the company acknowledges that allowing AI to execute code requires users to carefully vet which skills they trust.Early customers report 8x productivity gains on finance workflowsEarly customer implementations reveal how organizations are applying Skills to automate complex knowledge work. At Japanese e-commerce giant Rakuten, the AI team is using Skills to transform finance operations that previously required manual coordination across multiple departments."Skills streamline our management accounting and finance workflows," said Yusuke Kaji, general manager of AI at Rakuten in a statement. "Claude processes multiple spreadsheets, catches critical anomalies, and generates reports using our procedures. What once took a day, we can now accomplish in an hour."That's an 8x improvement in productivity for specific workflows — the kind of measurable return on investment that enterprises increasingly demand from AI implementations. Mike Krieger, Anthropic's chief product officer and Instagram co-founder, recently noted that companies have moved past "AI FOMO" to requiring concrete success metrics.Design platform Canva plans to integrate Skills into its own AI agent workflows. "Canva plans to leverage Skills to customize agents and expand what they can do," said Anwar Haneef, general manager and head of ecosystem at Canva in a statement. "This unlocks new ways to bring Canva deeper into agentic workflows—helping teams capture their unique context and create stunning, high-quality designs effortlessly."Cloud storage provider Box sees Skills as a way to make corporate content repositories more actionable. "Skills teaches Claude how to work with Box content," said Yashodha Bhavnani, head of AI at Box. "Users can transform stored files into PowerPoint presentations, Excel spreadsheets, and Word documents that follow their organization's standards—saving hours of effort."The enterprise security question: Who controls which AI skills employees can use?For enterprise IT departments, Skills raise important questions about governance and control—particularly since the feature allows AI to execute arbitrary code in sandboxed environments. Anthropic has built administrative controls that allow enterprise customers to manage access at the organizational level."Enterprise admins control access to the Skills capability via admin settings, where they can enable or disable access and monitor usage patterns," Murag said. "Once enabled at the organizational level, individual users still need to opt in."That two-layer consent model — organizational enablement plus individual opt-in — reflects lessons learned from previous enterprise AI deployments where blanket rollouts created compliance concerns. However, Anthropic's governance tools appear more limited than some enterprise customers might expect. The company doesn't currently offer granular controls over which specific skills employees can use, or detailed audit trails of custom skill content.Organizations concerned about data security should note that Skills require Claude's code execution environment, which runs in isolated containers. Anthropic advises users to "stick to trusted sources" when installing skills and provides security documentation, but the company acknowledges this is an inherently higher-risk capability than traditional AI interactions.From API to no-code: How Anthropic is making Skills accessible to everyoneAnthropic is taking several approaches to make Skills accessible to users with varying technical sophistication. For non-technical users on Claude.ai, the company provides a "skill-creator" skill that interactively guides users through building new skills by asking questions about their workflow, then automatically generating the folder structure and documentation.Developers working with Anthropic's API get programmatic control through a new /skills endpoint and can manage skill versions through the Claude Console web interface. The feature requires enabling the Code Execution Tool beta in API requests. For Claude Code users, skills can be installed via plugins from the anthropics/skills GitHub marketplace, and teams can share skills through version control systems."Skills are included in Max, Pro, Teams, and Enterprise plans at no additional cost," Murag confirmed. "API usage follows standard API pricing," meaning organizations pay only for the tokens consumed during skill execution, not for the skills themselves.Anthropic provides several pre-built skills for common business tasks, including professional generation of Excel spreadsheets with formulas, PowerPoint presentations, Word documents, and fillable PDFs. These Anthropic-created skills will remain free.Why the Skills launch matters in the AI coding wars with OpenAIThe Skills announcement arrives during a pivotal moment in Anthropic's competition with OpenAI, particularly around AI-assisted software development. Just one day before releasing Skills, Anthropic launched Claude Haiku 4.5, a smaller and cheaper model that nonetheless matches the coding performance of Claude Sonnet 4 — which was state-of-the-art when released just five months ago.That rapid improvement curve reflects the breakneck pace of AI development, where today's frontier capabilities become tomorrow's commodity offerings. OpenAI has been pushing hard on coding tools as well, recently upgrading its Codex platform with GPT-5 and expanding GitHub Copilot's capabilities.Anthropic's revenue trajectory — potentially reaching $26 billion in 2026 from an estimated $9 billion by year-end 2025 — suggests the company is successfully converting enterprise interest into paying customers. The timing also follows Salesforce's announcement this week that it's deepening AI partnerships with both OpenAI and Anthropic to power its Agentforce platform, signaling that enterprises are adopting a multi-vendor approach rather than standardizing on a single provider.Skills addresses a real pain point: the "prompt engineering" problem where effective AI usage depends on individual employees crafting elaborate instructions for routine tasks, with no way to share that expertise across teams. Skills transforms implicit knowledge into explicit, shareable assets. For startups and developers, the feature could accelerate product development significantly — adding sophisticated document generation capabilities that previously required dedicated engineering teams and weeks of development.The composability aspect hints at a future where organizations build libraries of specialized skills that can be mixed and matched for increasingly complex workflows. A pharmaceutical company might develop skills for regulatory compliance, clinical trial analysis, molecular modeling, and patient data privacy that work together seamlessly — creating a customized AI assistant with deep domain expertise across multiple specialties.Anthropic indicates it's working on simplified skill creation workflows and enterprise-wide deployment capabilities to make it easier for organizations to distribute skills across large teams. As the feature rolls out to Anthropic's more than 300,000 business customers, the true test will be whether organizations find Skills substantively more useful than existing customization approaches.For now, Skills offers Anthropic's clearest articulation yet of its vision for AI agents: not generalists that try to do everything reasonably well, but intelligent systems that know when to access specialized expertise and can coordinate multiple domains of knowledge to accomplish complex tasks. If that vision catches on, the question won't be whether your company uses AI — it will be whether your AI knows how your company actually works.
An increasing number of AI and machine learning-based systems feed on text data — language models are a notable example today.
One year after emerging from stealth, Strella has raised $14 million in Series A funding to expand its AI-powered customer research platform, the company announced Thursday. The round, led by Bessemer Venture Partners with participation from Decibel Partners, Bain Future Back Ventures, MVP Ventures and 645 Ventures, comes as enterprises increasingly turn to artificial intelligence to understand customers faster and more deeply than traditional methods allow.The investment marks a sharp acceleration for the startup founded by Lydia Hylton and Priya Krishnan, two former consultants and product managers who watched companies struggle with a customer research process that could take eight weeks from start to finish. Since October, Strella has grown revenue tenfold, quadrupled its customer base to more than 40 paying enterprises, and tripled its average contract values by moving upmarket to serve Fortune 500 companies."Research tends to be bookended by two very strategic steps: first, we have a problem—what research should we do? And second, we've done the research—now what are we going to do with it?" said Hylton, Strella's CEO, in an exclusive interview with VentureBeat. "All the stuff in the middle tends to be execution and lower-skill work. We view Strella as doing that middle 90% of the work."The platform now serves Amazon, Duolingo, Apollo GraphQL, and Chobani, collectively conducting thousands of AI-moderated interviews that deliver what the company claims is a 90% average time savings on manual research work. The company is approaching $1 million in revenue after beginning monetization only in January, with month-over-month growth of 50% and zero customer churn to date.How AI-powered interviews compress eight-week research projects into daysStrella's technology addresses a workflow that has frustrated product teams, marketers, and designers for decades. Traditional customer research requires writing interview guides, recruiting participants, scheduling calls, conducting interviews, taking notes, synthesizing findings, and creating presentations — a process that consumes weeks of highly-skilled labor and often delays critical product decisions.The platform compresses that timeline to days by using AI to moderate voice-based interviews that run like Zoom calls, but with an artificial intelligence agent asking questions, following up on interesting responses, and detecting when participants are being evasive or fraudulent. The system then synthesizes findings automatically, creating highlight reels and charts from unstructured qualitative data."It used to take eight weeks. Now you can do it in the span of a couple days," Hylton told VentureBeat. "The primary technology is through an AI-moderated interview. It's like being in a Zoom call with an AI instead of a human — it's completely free form and voice based."Critically, the platform also supports human moderators joining the same calls, reflecting the founders' belief that humans won't disappear from the research process. "Human moderation won't go away, which is why we've supported human moderation from our Genesis," Hylton said. "Research tends to be bookended by two very strategic steps: we have a problem, what's the research that we should do? And we've done the research, now what are we going to do with it? All the stuff in the middle tends to be execution and lower skill work. We view Strella as doing that middle 90% of the work."Why customers tell AI moderators the truth they won't share with humansOne of Strella's most surprising findings challenges assumptions about AI in qualitative research: participants appear more honest with AI moderators than with humans. The founders discovered this pattern repeatedly as customers ran head-to-head comparisons between traditional human-moderated studies and Strella's AI approach."If you're a designer and you get on a Zoom call with a customer and you say, 'Do you like my design?' they're always gonna say yes. They don't want to hurt your feelings," Hylton explained. "But it's not a problem at all for Strella. They would tell you exactly what they think about it, which is really valuable. It's very hard to get honest feedback."Krishnan, Strella's COO, said companies initially worried about using AI and "eroding quality," but the platform has "actually found the opposite to be true. People are much more open and honest with an AI moderator, and so the level of insight that you get is much richer because people are giving their unfiltered feedback."This dynamic has practical business implications. Brian Santiago, Senior Product Design Manager at Apollo GraphQL, said in a statement: "Before Strella, studies took weeks. Now we get insights in a day — sometimes in just a few hours. And because participants open up more with the AI moderator, the feedback is deeper and more honest."The platform also addresses endemic fraud in online surveys, particularly when participants are compensated. Because Strella interviews happen on camera in real time, the AI moderator can detect when someone pauses suspiciously long — perhaps to consult ChatGPT — and flags them as potentially fraudulent. "We are fraud resistant," Hylton said, contrasting this with traditional surveys where fraud rates can be substantial.Solving mobile app research with persistent screen sharing technologyA major focus of the Series A funding will be expanding Strella's recently-launched mobile application, which Krishnan identified as critical competitive differentiation. The mobile app enables persistent screen sharing during interviews — allowing researchers to watch users navigate mobile applications in real time while the AI moderator asks about their experience."We are the only player in the market that supports screen sharing on mobile," Hylton said. "You know, I want to understand what are the pain points with my app? Why do people not seem to be able to find the checkout flow? Well, in order to do that effectively, you'd like to see the user screen while they're doing an interview."For consumer-facing companies where mobile represents the primary customer interface, this capability opens entirely new use cases. The founders noted that "several of our customers didn't do research before" but have now built research practices around Strella because the platform finally made mobile research accessible at scale.The platform also supports embedding traditional survey question types directly into the conversational interview, approaching what Hylton called "feature parity with a survey" while maintaining the engagement advantages of a natural conversation. Strella interviews regularly run 60 to 90 minutes with nearly 100% completion rates—a duration that would see 60-70% drop-off in a traditional survey format.How Strella differentiated in a market crowded with AI research startupsStrella enters a market that appears crowded at first glance, with established players like Qualtrics and a wave of AI-powered startups promising to transform customer research. The founders themselves initially pursued a different approach — synthetic respondents, or "digital twins" that simulate customer perspectives using large language models."We actually pivoted from that. That was our initial idea," Hylton revealed, referring to synthetic respondents. "People are very intrigued by that concept, but found in practice, no willingness to pay right now."Recent research suggesting companies could use language models as digital twins for customer feedback has reignited interest in that approach. But Hylton remains skeptical: "The capabilities of the LLMs as they are today are not good enough, in my opinion, to justify a standalone company. Right now you could just ask ChatGPT, 'What would new users of Duolingo think about this ad copy?' You can do that. Adding the standalone idea of a synthetic panel is sort of just putting a wrapper on that."Instead, Strella's bet is that the real value lies in collecting proprietary qualitative data at scale — building what could become "the system of truth for all qualitative insights" within enterprises, as Lindsey Li, Vice President at Bessemer Venture Partners, described it.Li, who led the investment just one year after Strella emerged from stealth, said the firm was convinced by both the technology and the team. "Strella has built highly differentiated technology that enables a continuous interview rather than a survey," Li said. "We heard time and time again that customers loved this product experience relative to other offerings."On the defensibility question that concerns many AI investors, Li emphasized product execution over patents: "We think the long game here will be won with a million small product decisions, all of which must be driven by deep empathy for customer pain and an understanding of how best to address their needs. Lydia and Priya exhibit that in spades."The founders point to technical depth that's difficult to replicate. Most competitors started with adaptive surveys — text-based interfaces where users type responses and wait for the next question. Some have added voice, but typically as uploaded audio clips rather than free-flowing conversation."Our approach is fundamentally better, which is the fact that it is a free form conversation," Hylton said. "You never have to control anything. You're never typing, there's no buttons, there's no upload and wait for the next question. It's completely free form, and that has been an extraordinarily hard product to build. There's a tremendous amount of IP in the way that we prompt our moderator, the way that we run analysis."The platform also improves with use, learning from each customer's research patterns to fine-tune future interview guides and questions. "Our product gets better for our customers as they continue to use us," Hylton said. All research accumulates in a central repository where teams can generate new insights by chatting with the data or creating visualizations from previously unstructured qualitative feedback.Creating new research budgets instead of just automating existing onesPerhaps more important than displacing existing research is expanding the total market. Krishnan said growth has been "fundamentally related to our product" creating new research that wouldn't have happened otherwise."We have expanded the use cases in which people would conduct research," Krishnan explained. "Several of our customers didn't do research before, have always wanted to do research, but didn't have a dedicated researcher or team at their company that was devoted to it, and have purchased Strella to kick off and enable their research practice. That's been really cool where we've seen this market just opening up."This expansion comes as enterprises face mounting pressure to improve customer experience amid declining satisfaction scores. According to Forrester Research's 2024 Customer Experience Index, customer experience quality has declined for three consecutive years — an unprecedented trend. The report found that 39% of brands saw CX quality deteriorate, with declines across effectiveness, ease, and emotional connection.Meanwhile, Deloitte's 2025 Technology, Media & Telecommunications Predictions report forecasts that 25% of enterprises using generative AI will deploy AI agents by 2025, growing to 50% by 2027. The report specifically highlighted AI's potential to enhance customer satisfaction by 15-20% while reducing cost to serve by 20-30% when properly implemented.Gartner identified conversational user interfaces — the category Strella inhabits — as one of three technologies poised to transform customer service by 2028, noting that "customers increasingly expect to be able to interact with the applications they use in a natural way."Against this backdrop, Li sees substantial room for growth. "UX Research is a sub-sector of the $140B+ global market-research industry," Li said. "This includes both the software layer historically (~$430M) and professional services spend on UX research, design, product strategy, etc. which is conservatively estimated to be ~$6.4B+ annually. As software in this vertical, led by Strella, becomes more powerful, we believe the TAM will continue to expand meaningfully."Making customer feedback accessible across the enterprise, not just research teamsThe founders describe their mission as "democratizing access to the customer" — making it possible for anyone in an organization to understand customer perspectives without waiting for dedicated research teams to complete months-long studies."Many, many, many positions in the organization would like to get customer feedback, but it's so hard right now," Hylton said. With Strella, she explained, someone can "log into Strella and through a chat, create any highlight reel that you want and actually see customers in their own words answering the question that you have based on the research that's already been done."This video-first approach to research repositories changes organizational dynamics around customer feedback. "Then you can say, 'Okay, engineering team, we need to build this feature. And here's the customer actually saying it,'" Hylton continued. "'This is not me. This isn't politics. Here are seven customers saying they can't find the Checkout button.' The fact that we are a very video-based platform really allows us to do that quickly and painlessly."The company has moved decisively upmarket, with contract values now typically in the five-figure range and "several six figure contracts" signed, according to Krishnan. The pricing strategy reflects a premium positioning: "Our product is very good, it's very premium. We're charging based on the value it provides to customers," Krishnan said, rather than competing on cost alone.This approach appears to be working. The company reports 100% conversion from pilot programs to paid contracts and zero churn among its 40-45 customers, with month-over-month revenue growth of 50%.The roadmap: Computer vision, agentic AI, and human-machine collaborationThe Series A funding will primarily support scaling product and go-to-market teams. "We're really confident that we have product-market fit," Hylton said. "And now the question is execution, and we want to hire a lot of really talented people to help us execute."On the product roadmap, Hylton emphasized continued focus on the participant experience as the key to winning the market. "Everything else is downstream of a joyful participant experience," she said, including "the quality of insights, the amount you have to pay people to do the interviews, and the way that your customers feel about a company."Near-term priorities include adding visual capabilities so the AI moderator can respond to facial expressions and other nonverbal cues, and building more sophisticated collaboration features between human researchers and AI moderators. "Maybe you want to listen while an AI moderator is running a call and you might want to be able to jump in with specific questions," Hylton said. "Or you want to run an interview yourself, but you want the moderator to be there as backup or to help you."These features move toward what the industry calls "agentic AI" — systems that can act more autonomously while still collaborating with humans. The founders see this human-AI collaboration, rather than full automation, as the sustainable path forward."We believe that a lot of the really strategic work that companies do will continue to be human moderated," Hylton said. "And you can still do that through Strella and just use us for synthesis in those cases."For Li and Bessemer, the bet is on founders who understand this nuance. "Lydia and Priya exhibit the exact archetype of founders we are excited to partner with for the long term — customer-obsessed, transparent, thoughtful, and singularly driven towards the home-run scenario," she said.The company declined to disclose specific revenue figures or valuation. With the new funding, Strella has now raised $18 million total, including a $4 million seed round led by Decibel Partners announced in October.As Strella scales, the founders remain focused on a vision where technology enhances rather than eliminates human judgment—where an engineering team doesn't just read a research report, but watches seven customers struggle to find the same button. Where a product manager can query months of accumulated interviews in seconds. Where companies don't choose between speed and depth, but get both."The interesting part of the business is actually collecting that proprietary dataset, collecting qualitative research at scale," Hylton said, describing what she sees as Strella's long-term moat. Not replacing the researcher, but making everyone in the company one.
Microsoft is fundamentally reimagining how people interact with their computers, announcing Thursday a sweeping transformation of Windows 11 that brings voice-activated AI assistants, autonomous software agents, and contextual intelligence to every PC running the operating system — not just premium devices with specialized chips.The announcement represents Microsoft's most aggressive push yet to integrate generative artificial intelligence into the desktop computing experience, moving beyond the chatbot interfaces that have defined the first wave of consumer AI products toward a more ambient, conversational model where users can simply talk to their computers and have AI agents complete complex tasks on their behalf."When we think about what the promise of an AI PC is, it should be capable of three things," Yusuf Mehdi, Microsoft's Executive Vice President and Consumer Chief Marketing Officer, told reporters at a press conference last week. "First, you should be able to interact with it naturally, in text or voice, and have it understand you. Second, it should be able to see what you see and be able to offer guided support. And third, it should be able to take action on your behalf."The shift could prove consequential for an industry searching for the "killer app" for generative AI. While hundreds of millions of people have experimented with ChatGPT and similar chatbots, integrating AI directly into the operating system that powers the vast majority of workplace computers could dramatically accelerate mainstream adoption — or create new security and privacy headaches for organizations already struggling to govern employee use of AI tools.How 'Hey Copilot' aims to replace typing with talking on Windows PCsAt the heart of Microsoft's vision is voice interaction, which the company is positioning as the third fundamental input method for PCs after the mouse and keyboard — a comparison that underscores Microsoft's ambitions for reshaping human-computer interaction nearly four decades after the graphical user interface became standard.Starting this week, any Windows 11 user can enable the "Hey Copilot" wake word with a single click, allowing them to summon Microsoft's AI assistant by voice from anywhere in the operating system. The feature, which had been in limited testing, is now being rolled out to hundreds of millions of devices globally."It's been almost four decades since the PC has changed the way you interact with it, which is primarily mouse and keyboard," Mehdi said. "When you think about it, we find that people type on a given day up to 14,000 words on their keyboard, which is really kind of mind-boggling. But what if now you can go beyond that and talk to it?"The emphasis on voice reflects internal Microsoft data showing that users engage with Copilot twice as much when using voice compared to text input — a finding the company attributes to the lower cognitive barrier of speaking versus crafting precise written prompts."The magic unlock with Copilot Voice and Copilot Vision is the ease of interaction," according to the company's announcement. "Using the new wake word, 'Hey Copilot,' getting something done is as easy as just asking for it."But Microsoft's bet on voice computing faces real-world constraints that Mehdi acknowledged during the briefing. When asked whether workers in shared office environments would use voice features, potentially compromising privacy, Mehdi noted that millions already conduct voice calls through their PCs with headphones, and predicted users would adapt: "Just like when the mouse came out, people have to figure out when to use it, what's the right way, how to make it happen."Crucially, Microsoft is hedging its voice-first strategy by making all features accessible through traditional text input as well, recognizing that voice isn't always appropriate or accessible.AI that sees your screen: Copilot Vision expands worldwide with new capabilitiesPerhaps more transformative than voice control is the expansion of Copilot Vision, a feature Microsoft introduced earlier this year that allows the AI to analyze what's displayed on a user's screen and provide contextual assistance.Previously limited to voice interaction, Copilot Vision is now rolling out worldwide with a new text-based interface, allowing users to type questions about what they're viewing rather than speaking them aloud. The feature can now access full document context in Microsoft Office applications — meaning it can analyze an entire PowerPoint presentation or Excel spreadsheet without the user needing to scroll through every page."With 68 percent of consumers reporting using AI to support their decision making, voice is making this easier," Microsoft explained in its announcement. "The magic unlock with Copilot Voice and Copilot Vision is the ease of interaction."During the press briefing, Microsoft demonstrated Copilot Vision helping users navigate Spotify's settings to enable lossless audio streaming, coaching an artist through writing a professional bio based on their visual portfolio, and providing shopping recommendations based on products visible in YouTube videos."What brings AI to life is when you can give it rich context, when you can type great prompts," Mehdi explained. "The big challenge for the majority of people is we've been trained with search to do the opposite. We've been trained to essentially type in fewer keywords, because it turns out the less keywords you type on search, the better your answers are."He noted that average search queries remain just 2.3 keywords, while AI systems perform better with detailed prompts — creating a disconnect between user habits and AI capabilities. Copilot Vision aims to bridge that gap by automatically gathering visual context."With Copilot Vision, you can simply share your screen and Copilot in literally milliseconds can understand everything on the screen and then provide intelligence," Mehdi said.The vision capabilities work with any application without requiring developers to build specific integrations, using computer vision to interpret on-screen content — a powerful capability that also raises questions about what the AI can access and when.Software robots take control: Inside Copilot Actions' controversial autonomyThe most ambitious—and potentially controversial—new capability is Copilot Actions, an experimental feature that allows AI to take control of a user's computer to complete tasks autonomously.Coming first to Windows Insiders enrolled in Copilot Labs, the feature builds on Microsoft's May announcement of Copilot Actions on the web, extending the capability to manipulate local files and applications on Windows PCs.During demonstrations, Microsoft showed the AI agent organizing photo libraries, extracting data from documents, and working through multi-step tasks while users attended to other work. The agent operates in a separate, sandboxed environment and provides running commentary on its actions, with users able to take control at any time."As a general-purpose agent — simply describe the task you want to complete in your own words, and the agent will attempt to complete it by interacting with desktop and web applications," according to the announcement. "While this is happening, you can choose to focus on other tasks. At any time, you can take over the task or check in on the progress of the action, including reviewing what actions have been taken."Navjot Virk, Microsoft's Windows Experience Leader, acknowledged the technology's current limitations during the briefing. "We'll be starting with a narrow set of use cases while we optimize model performance and learn," Virk said. "You may see the agent make mistakes or encounter challenges with complex interfaces, which is why real-world testing of this experience is so critical."The experimental nature of Copilot Actions reflects broader industry challenges with agentic AI — systems that can take actions rather than simply providing information. While the potential productivity gains are substantial, AI systems still occasionally "hallucinate" incorrect information and can be vulnerable to novel attacks.Can AI agents be trusted? Microsoft's new security framework explainedRecognizing the security implications of giving AI control over users' computers and files, Microsoft introduced a new security framework built on four core principles: user control, operational transparency, limited privileges, and privacy-preserving design.Central to this approach is the concept of "agent accounts" — separate Windows user accounts under which AI agents operate, distinct from the human user's account. Combined with a new "agent workspace" that provides a sandboxed desktop environment, the architecture aims to create clear boundaries around what agents can access and modify.Peter Waxman, Microsoft's Windows Security Engineering Leader, emphasized that Copilot Actions is disabled by default and requires explicit user opt-in. "You're always in control of what Copilot Actions can do," Waxman said. "Copilot Actions is turned off by default and you're able to pause, take control, or disable it at any time."During operation, users can monitor the agent's progress in real-time, and the system requests additional approval before taking "sensitive or important" actions. All agent activity occurs under the dedicated agent account, creating an audit trail that distinguishes AI actions from human ones.However, the agent will have default access to users' Documents, Downloads, Desktop, and Pictures folders—a broad permission grant that could concern enterprise IT administrators.Dana Huang, Corporate Vice President for Windows Security, acknowledged in a blog post that "agentic AI applications introduce novel security risks, such as cross-prompt injection (XPIA), where malicious content embedded in UI elements or documents can override agent instructions, leading to unintended actions like data exfiltration or malware installation."Microsoft promises more details about enterprise controls at its Ignite conference in November.Gaming, taskbar redesign, and deeper Office integration round out updatesBeyond voice and autonomous agents, Microsoft introduced changes across Windows 11's core interfaces and extended AI to new domains.A new "Ask Copilot" feature integrates AI directly into the Windows taskbar, providing one-click access to start conversations, activate vision capabilities, or search for files and settings with "lightning-fast" results. The opt-in feature doesn't replace traditional Windows search.File Explorer gains AI capabilities through integration with third-party services. A partnership with Manus AI allows users to right-click on local image files and generate complete websites without manual uploading or coding. Integration with Filmora enables quick jumps into video editing workflows.Microsoft also introduced Copilot Connectors, allowing users to link cloud services like OneDrive, Outlook, Google Drive, Gmail, and Google Calendar directly to Copilot on Windows. Once connected, users can query personal content across platforms using natural language.In a notable expansion beyond productivity, Microsoft and Xbox introduced Gaming Copilot for the ROG Xbox Ally handheld gaming devices developed with ASUS. The feature, accessible via a dedicated hardware button, provides an AI assistant that can answer gameplay questions, offer strategic advice, and help navigate game interfaces through natural voice conversation.Why Microsoft is racing to embed AI everywhere before Apple and GoogleMicrosoft's announcement comes as technology giants race to embed generative AI into their core products following the November 2022 launch of ChatGPT. While Microsoft moved quickly to integrate OpenAI's technology into Bing search and introduce Copilot across its product line, the company has faced questions about whether AI features are driving meaningful engagement. Recent data shows Bing's search market share remaining largely flat despite AI integration.The Windows integration represents a different approach: rather than charging separately for AI features, Microsoft is building them into the operating system itself, betting that embedded AI will drive Windows 11 adoption and competitive differentiation against Apple and Google.Apple has taken a more cautious approach with Apple Intelligence, introducing AI features gradually and emphasizing privacy through on-device processing. Google has integrated AI across its services but has faced challenges with accuracy and reliability.Crucially, while Microsoft highlighted new Copilot+ PC models from partners with prices ranging from $649.99 to $1,499.99, the core AI features announced today work on any Windows 11 PC — a significant departure from earlier positioning that suggested AI capabilities required new hardware with specialized neural processing units."Everything we showed you here is for all Windows 11 PCs. You don't need to run it on a copilot plus PC. It works on any Windows 11 PC," Mehdi clarified.This democratization of AI features across the Windows 11 installed base potentially accelerates adoption but also complicates Microsoft's hardware sales pitch for premium devices.What Microsoft's AI bet means for the future of computingMehdi framed the announcement in sweeping terms, describing Microsoft's goal as fundamentally reimagining the operating system for the AI era."We're taking kind of a bold view of it. We really feel that the vision that we have is, let's rewrite the entire operating system around AI and build essentially what becomes truly the AI PC," he said.For Microsoft, the success of AI-powered Windows 11 could help drive the company's next phase of growth as PC sales have matured and cloud growth faces increased competition.For users and organizations, the announcement represents a potential inflection point in how humans interact with computers — one that could significantly boost productivity if executed well, or create new security headaches if the AI proves unreliable or difficult to control.The technology industry will be watching closely to see whether Microsoft's bet on conversational computing and agentic AI marks the beginning of a genuine paradigm shift, or proves to be another ambitious interface reimagining that fails to gain mainstream traction.What's clear is that Microsoft is moving aggressively to stake its claim as the leader in AI-powered personal computing, leveraging its dominant position in desktop operating systems to bring generative AI directly into the daily workflows of potentially a billion users.Copilot Voice and Vision are available today to Windows 11 users worldwide, with experimental capabilities coming to Windows Insiders in the coming weeks.
We’re partnering with Commonwealth Fusion Systems (CFS) to bring clean, safe, limitless fusion energy closer to reality.
Databricks is one of the leading platforms for building and executing machine learning notebooks at scale. It combines Apache Spark capabilities with a notebook-preferring interface, experiment tracking, and integrated data tooling. Here in this article, I’ll guide you through the process of hosting your ML notebook in Databricks step by step. Databricks offers several plans, but […]
The post How to Run your ML Notebook on Databricks? appeared first on Analytics Vidhya.
A new framework from Stanford University and SambaNova addresses a critical challenge in building robust AI agents: context engineering. Called Agentic Context Engineering (ACE), the framework automatically populates and modifies the context window of large language model (LLM) applications by treating it as an “evolving playbook” that creates and refines strategies as the agent gains experience in its environment.ACE is designed to overcome key limitations of other context-engineering frameworks, preventing the model’s context from degrading as it accumulates more information. Experiments show that ACE works for both optimizing system prompts and managing an agent's memory, outperforming other methods while also being significantly more efficient.The challenge of context engineeringAdvanced AI applications that use LLMs largely rely on "context adaptation," or context engineering, to guide their behavior. Instead of the costly process of retraining or fine-tuning the model, developers use the LLM’s in-context learning abilities to guide its behavior by modifying the input prompts with specific instructions, reasoning steps, or domain-specific knowledge. This additional information is usually obtained as the agent interacts with its environment and gathers new data and experience. The key goal of context engineering is to organize this new information in a way that improves the model’s performance and avoids confusing it. This approach is becoming a central paradigm for building capable, scalable, and self-improving AI systems.Context engineering has several advantages for enterprise applications. Contexts are interpretable for both users and developers, can be updated with new knowledge at runtime, and can be shared across different models. Context engineering also benefits from ongoing hardware and software advances, such as the growing context windows of LLMs and efficient inference techniques like prompt and context caching.There are various automated context-engineering techniques, but most of them face two key limitations. The first is a “brevity bias,” where prompt optimization methods tend to favor concise, generic instructions over comprehensive, detailed ones. This can undermine performance in complex domains. The second, more severe issue is "context collapse." When an LLM is tasked with repeatedly rewriting its entire accumulated context, it can suffer from a kind of digital amnesia.“What we call ‘context collapse’ happens when an AI tries to rewrite or compress everything it has learned into a single new version of its prompt or memory,” the researchers said in written comments to VentureBeat. “Over time, that rewriting process erases important details—like overwriting a document so many times that key notes disappear. In customer-facing systems, this could mean a support agent suddenly losing awareness of past interactions... causing erratic or inconsistent behavior.”The researchers argue that “contexts should function not as concise summaries, but as comprehensive, evolving playbooks—detailed, inclusive, and rich with domain insights.” This approach leans into the strength of modern LLMs, which can effectively distill relevance from long and detailed contexts.How Agentic Context Engineering (ACE) worksACE is a framework for comprehensive context adaptation designed for both offline tasks, like system prompt optimization, and online scenarios, such as real-time memory updates for agents. Rather than compressing information, ACE treats the context like a dynamic playbook that gathers and organizes strategies over time.The framework divides the labor across three specialized roles: a Generator, a Reflector, and a Curator. This modular design is inspired by “how humans learn—experimenting, reflecting, and consolidating—while avoiding the bottleneck of overloading a single model with all responsibilities,” according to the paper.The workflow starts with the Generator, which produces reasoning paths for input prompts, highlighting both effective strategies and common mistakes. The Reflector then analyzes these paths to extract key lessons. Finally, the Curator synthesizes these lessons into compact updates and merges them into the existing playbook.To prevent context collapse and brevity bias, ACE incorporates two key design principles. First, it uses incremental updates. The context is represented as a collection of structured, itemized bullets instead of a single block of text. This allows ACE to make granular changes and retrieve the most relevant information without rewriting the entire context.Second, ACE uses a “grow-and-refine” mechanism. As new experiences are gathered, new bullets are appended to the playbook and existing ones are updated. A de-duplication step regularly removes redundant entries, ensuring the context remains comprehensive yet relevant and compact over time.ACE in actionThe researchers evaluated ACE on two types of tasks that benefit from evolving context: agent benchmarks requiring multi-turn reasoning and tool use, and domain-specific financial analysis benchmarks demanding specialized knowledge. For high-stakes industries like finance, the benefits extend beyond pure performance. As the researchers said, the framework is “far more transparent: a compliance officer can literally read what the AI learned, since it’s stored in human-readable text rather than hidden in billions of parameters.”The results showed that ACE consistently outperformed strong baselines such as GEPA and classic in-context learning, achieving average performance gains of 10.6% on agent tasks and 8.6% on domain-specific benchmarks in both offline and online settings.Critically, ACE can build effective contexts by analyzing the feedback from its actions and environment instead of requiring manually labeled data. The researchers note that this ability is a "key ingredient for self-improving LLMs and agents." On the public AppWorld benchmark, designed to evaluate agentic systems, an agent using ACE with a smaller open-source model (DeepSeek-V3.1) matched the performance of the top-ranked, GPT-4.1-powered agent on average and surpassed it on the more difficult test set.The takeaway for businesses is significant. “This means companies don’t have to depend on massive proprietary models to stay competitive,” the research team said. “They can deploy local models, protect sensitive data, and still get top-tier results by continuously refining context instead of retraining weights.”Beyond accuracy, ACE proved to be highly efficient. It adapts to new tasks with an average 86.9% lower latency than existing methods and requires fewer steps and tokens. The researchers point out that this efficiency demonstrates that “scalable self-improvement can be achieved with both higher accuracy and lower overhead.”For enterprises concerned about inference costs, the researchers point out that the longer contexts produced by ACE do not translate to proportionally higher costs. Modern serving infrastructures are increasingly optimized for long-context workloads with techniques like KV cache reuse, compression, and offloading, which amortize the cost of handling extensive context.Ultimately, ACE points toward a future where AI systems are dynamic and continuously improving. "Today, only AI engineers can update models, but context engineering opens the door for domain experts—lawyers, analysts, doctors—to directly shape what the AI knows by editing its contextual playbook," the researchers said. This also makes governance more practical. "Selective unlearning becomes much more tractable: if a piece of information is outdated or legally sensitive, it can simply be removed or replaced in the context, without retraining the model.”
We’re launching a new 27 billion parameter foundation model for single-cell analysis built on the Gemma family of open models.
When Walmart and OpenAI announced that the retailer would integrate with ChatGPT, the question became how quickly OpenAI could deliver on the promise of agents buying things for people. In the battle of AI-enabled commerce, getting agents to securely complete transactions is one of the biggest hurdles. More and more, chat platforms like ChatGPT are replacing browsers and getting very good at surfacing information people search for. Users will ask ChatGPT for the best humidifiers on the market, and when the model returns results, people have no choice but to click the item link and complete the purchase online. AI agents, as of now, don’t have the ability or the trust infrastructure to make people and banking institutions feel safe enough to let it loose on someone’s cash. Enterprises and other industry players understand that, to allow agents to pay for purchases, there must be a common language shared among the model and agent providers, the bank, the merchant, and, to a lesser extent, the buyer. And so, over the past few weeks, three competing agentic commerce standards have emerged: Google announced the Agent Pay Protocol (AP2) with partners including PayPal, American Express, Mastercard, Salesforce and ServiceNow. Soon after, OpenAI and Stripe debuted the Agentic Commerce Protocol (ACP), and just this week, Visa launched the Trusted Agent Protocol (TAP).All these protocols aim to give agents the trust layer they need to convince banks and their customers that they’re money is safe in the hands of an AI agent. But these may also create walled gardens, showing just how immature agentic commerce really is. This is a problem that could cause enterprises to bet on one chat platform and the agentic pay protocol it runs on, instead of interoperability. How are they differentIt’s not new for players to propose several standards. It usually takes years for the industry to coalesce around a single standard, or even to use different protocols and figure out a way to harmonize them. However, the pace of innovation in enterprise moved the needle on that. Fairly quickly, MCP became the de facto channel for tool-use identification, and most companies began setting up MCP servers or connecting to one. (To be clear, it is not a standard yet) But having three different potential standards might slow that process down a bit, because it’s harder to coalesce on a single standard when there are so many to choose from. These protocols all aim to prove authorization. Both AP2 and TAP rely on cryptographic proofs to show an agent is acting on an individual's behalf. For TAP, agents are added to an approved list and get a digital key identifying them. AP2 uses a digital contract that serves as a proxy for human approval for the agent. OpenAI’s ACP doesn’t require too much of an infrastructure change, where ACP essentially acts as a courier to the merchant because the agent relays information to the merchant. Walled gardensThese three protocols ideally work across different chat platforms, but that is never guaranteed, especially when your biggest chat platform competitor has its own protocol. A danger with competing protocols is that they can create wall gardens, where they only work on specific platforms. Enterprises face the problem of getting stuck in a platform and an agentic payment standard that will not interoperate with another. Organizations receive not only the product recommended by the agent, but are also most often the merchants of record and need to trust that the agent contacting them is acting on behalf of a customer.Louis Amira, cofounder and CEO of agent commerce startup Circuit and Chisel, told VentureBeat that while this creates an opportunity for companies in the interoperability layer like his, it could create confusion for enterprises. “The better the protocol proposals get, the more likely they are to end up being walled gardens and very hard to interoperate,” Amira said. “We suspect that they’re going to be fighting it out for the next few years, and the more they fight it out, the more you actually need somebody that sits underneath all of them.”Unlike the internet, where anyone can use any browser to access a website, thanks in large part to the TCP/IP standard, chat platforms tend to remain very separate. I mostly use ChatGPT (because it’s installed on my laptop and I don’t need to open a new tab), so when I want to see how Gemini will handle my query, I actually have to open Gemini to do so—the same works for anyone shopping via chatbot. The number of protocol proposals underscores just how far we are from enabling shopping agents. The industry still needs to decide which standard to get behind, and no matter how many Walmarts integrate with ChatGPT, it’s all moot if people don’t trust the model or agent to handle their cash. Take the best features, hopefully
The best thing for enterprises to do for now is to experiment with all the protocols and hope that a winner emerges. Eventually, there could be one agentic commerce protocol that takes the best of each proposal. For Wayne Liu, chief growth officer and president for Americas at Perfect Corp., having multiple protocol proposals just means there’s more learning.“This is where the importance of open source exists because it will be the driving force to put everything together,” Liu said. Of course, what would be interesting to see these next couple of weeks is if there will only be three competing agentic commerce protocols. After all, there are some large retailers and chat platforms that can still throw a wrench into the whole thing.