How do AI videos end up on Donald Trump’s social media accounts? WIRED investigates.
The initiative brings together some of the world's most prestigious research institutions to pioneer the use of AI in mathematical research.
The initiative brings together some of the world's most prestigious research institutions to pioneer the use of AI in mathematical research.
Discover the leading open-source text-to-speech models that rival premium tools in realism, emotion, and performance so that you can turn ideas into lifelike voices and power the next wave of creator audio.
Ahead of Adobe's MAX Sneaks event, WIRED got an exclusive look at a new tool that can change the tone and style of a voice-over.
Radu Jude’s AI-filled adaptation features Dracula doing porn and exploiting tech workers.
Separating AI reality from hyped-up fiction isn’t always easy. That’s why we’ve created the AI Hype Index—a simple, at-a-glance summary of everything you need to know about the state of the industry. Just about all businesses these days seem to be pivoting to AI, even when they don’t seem to know exactly why they’re investing…
An AI model released by Chinese AI company DeepSeek uses new techniques that could significantly improve AI’s ability to “remember.” Released last week, the optical character recognition (OCR) model works by extracting text from an image and turning it into machine-readable words. This is the same technology that powers scanner apps, translation of text in…
The moment Mack McConnell knew everything about search had changed came last summer at the Paris Olympics. His parents, independently and without prompting, had both turned to ChatGPT to plan their day's activities in the French capital. The AI recommended specific tour companies, restaurants, and attractions — businesses that had won a new kind of visibility lottery."It was almost like this intuitive interface that older people were as comfortable with using as younger people," McConnell recalled in an exclusive interview with VentureBeat. "I could just see the businesses were now being recommended."That observation has now become the foundation of Geostar, a Pear VC-backed startup that's racing to help businesses navigate what may be the most significant shift in online discovery since Google's founding. The company, which recently emerged from stealth with impressive early customer traction, is betting that the rise of AI-powered search represents a significant opportunity to reinvent how companies get found online. The global AI search engine market alone is projected to grow from $43.63 billion in 2025 to $108.88 billion by 2032.Already the fastest-growing company in PearX's latest cohort, Geostar is fast approaching $1 million in annual recurring revenue in just four months — with only two founders and no employees.Why Gartner predicts traditional search volume will decline 25% by 2026The numbers tell a stark story of disruption. Gartner predicts that traditional search engine volume will decline by 25% by 2026, largely due to the rise of AI chatbots. Google's AI Overviews now appear on billions of searches monthly. Princeton University researchers have found that optimizing for these new AI systems can increase visibility by up to 40%."Search used to mean that you had to make Google happy," McConnell explained. "But now you have to optimize for four different Google interfaces — traditional search, AI Mode, Gemini, and AI Overviews — each with different criteria. And then ChatGPT, Claude, and Perplexity each work differently on top of that."This fragmentation is creating chaos for businesses that have spent decades perfecting their Google search strategies. A recent Forrester study found that 95% of B2B buyers plan to use generative AI in future purchase decisions. Yet most companies remain woefully unprepared for this shift."Anybody who's not on this right now is losing out," said Cihan Tas, Geostar's co-founder and chief technology officer. "We see lawyers getting 50% of their clients through ChatGPT now. It's just such a massive shift."How language models read the web differently than search engines ever didWhat Geostar and a growing cohort of competitors call Generative Engine Optimization or GEO represents a fundamental departure from traditional search engine optimization. Where SEO focused primarily on keywords and backlinks, GEO requires understanding how large language models parse, understand, and synthesize information across the entire web.The technical challenges are formidable. Every website must now function as what Tas calls "its own little database" capable of being understood by dozens of different AI crawlers, each with unique requirements and preferences. Google's systems pull from their existing search index. ChatGPT relies heavily on structured data and specific content formats. Perplexity shows a marked preference for Wikipedia and authoritative sources."Now the strategy is actually being concise, clear, and answering the question, because that's directly what the AI is looking for," Tas explained. "You're actually tuning for somewhat of an intelligent model that makes decisions similarly to how we make decisions."Consider schema markup, the structured data that helps machines understand web content. While only 30% of websites currently implement comprehensive schema, research shows that pages with proper markup are 36% more likely to appear in AI-generated summaries. Yet most businesses don't even know what schema markup is, let alone how to implement it effectively.Inside Geostar's AI agents that optimize websites continuously without human interventionGeostar's solution embodies a broader trend in enterprise software: the rise of autonomous AI agents that can take action on behalf of businesses. The company embeds what it calls "ambient agents" directly into client websites, continuously optimizing content, technical configurations, and even creating new pages based on patterns learned across its entire customer base."Once we learn something about the way content performs, or the way a technical optimization performs, we can then syndicate that same change across the remaining users so everyone in the network benefits," McConnell said.For RedSift, a cybersecurity company, this approach yielded a 27% increase in AI mentions within three months. In one case, Geostar identified an opportunity to rank for "best DMARC vendors," a high-value search term in the email security space. The company's agents created and optimized content that achieved first-page rankings on both Google and ChatGPT within four days."We're doing the work of an agency that charges $10,000 a month," McConnell said, noting that Geostar's pricing ranges from $1,000 to $3,000 monthly. "AI creates a situation where, for the first time ever, you can take action like an agency, but you can scale like software."Why brand mentions without links now matter more than ever in the AI eraThe implications of this shift extend far beyond technical optimizations. In the SEO era, a mention without a link was essentially worthless. In the age of AI, that calculus has reversed. AI systems can analyze vast amounts of text to understand sentiment and context, meaning that brand mentions on Reddit, in news articles, or across social media now directly influence how AI systems describe and recommend companies."If the New York Times mentions a company without linking to it, that company would actually benefit from that in an AI system," McConnell explained. "AI has the ability to do mass analysis of huge amounts of text, and it will understand the sentiment around that mention."This has created new vulnerabilities. Research from the Indian Institute of Technology and Princeton found that AI systems show systematic bias toward third-party sources over brand-owned content. A company's own website might be less influential in shaping AI perceptions than what others say about it online.The shifting landscape has also disrupted traditional metrics of success. Where SEO focused on rankings and click-through rates, GEO must account for what researchers call impression metrics — how prominently and positively a brand appears within AI-generated responses, even when users never click through to the source.A growing market as SEO veterans and new players rush to dominate AI optimizationGeostar is hardly alone in recognizing this opportunity. Companies like Brandlight, Profound, and Goodie are all racing to help businesses navigate the new landscape. The SEO industry, worth approximately $80 billion globally, is scrambling to adapt, with established players like Semrush and Ahrefs rushing to add AI visibility tracking features.But the company's founders, who previously built and sold a Y-Combinator-backed e-commerce optimization startup called Monto, believe their technical approach gives them an edge. Unlike competitors who largely provide dashboards and recommendations, Geostar's agents actively implement changes."Everyone is taking the same solutions that worked in the last era and just saying, 'We'll do this for AI instead,'" McConnell argued. "But when you think about what AI is truly capable of, it can actually do the work for you."The stakes are particularly high for small and medium-sized businesses. While large corporations can afford to hire specialized consultants or build internal expertise, smaller companies risk becoming invisible in AI-mediated search. Geostar sees this as its primary market opportunity: nearly half of the 33.2 million small businesses in America invest in SEO. Among the roughly 418,000 law firms in the U.S., many spend between $2,500 and $5,000 monthly on search optimization to stay competitive in local markets.From Kurdish village to PearX: The unlikely partnership building the future of searchFor Tas, whose journey to Silicon Valley began in a tiny Kurdish village in Turkey with just 50 residents, the current moment represents both opportunity and responsibility. His mother's battle with cancer prevented him from finishing college, leading him to teach himself programming and eventually partner with McConnell — whom he worked with for an entire year before they ever met in person."We're not just copy and pasting a solution that was existing before," Tas emphasized. "This is something that's different and was uniquely possible today."Looking forward, the transformation of search appears to be accelerating rather than stabilizing. Industry observers predict that search functionality will soon be embedded in productivity tools, wearables, and even augmented reality interfaces. Each new surface will likely have its own optimization requirements, further complicating the landscape."Soon, search will be in our eyes, in our ears," McConnell predicted. "When Siri breaks out of her prison, whatever that Jony Ive and OpenAI are building together will be like a multimodal search interface."The technical challenges are matched by ethical ones. As businesses scramble to influence AI recommendations, questions arise about manipulation, fairness, and transparency. There's currently no oversight body or established best practices for GEO, creating what some critics describe as a Wild West environment.As businesses grapple with these changes, one thing seems certain: the era of simply optimizing for Google is over. In its place is emerging a far more complex ecosystem where success requires understanding not just how machines index information, but how they think about it, synthesize it, and ultimately decide what to recommend to humans seeking answers.For the millions of businesses whose survival depends on being discovered online, mastering this new paradigm isn't just an opportunity — it's an existential imperative. The question is no longer whether to optimize for AI search, but whether companies can adapt quickly enough to remain visible as the pace of change accelerates.McConnell's parents at the Olympics were a preview of what's already becoming the norm. They didn't search for tour companies in Paris. They didn't scroll through results or click on links. They simply asked ChatGPT what to do — and the AI decided which businesses deserved their attention.In the new economy of discovery, the businesses that win won't be the ones that rank highest. They'll be the ones AI chooses to recommend.
Presented by ElasticAs organizations scramble to enact agentic AI solutions, accessing proprietary data from all the nooks and crannies will be keyBy now, most organizations have heard of agentic AI, which are systems that “think” by autonomously gathering tools, data and other sources of information to return an answer. But here’s the rub: reliability and relevance depend on delivering accurate context. In most enterprises, this context is scattered across various unstructured data sources, including documents, emails, business apps, and customer feedback. As organizations look ahead to 2026, solving this problem will be key to accelerating agentic AI rollouts around the world, says Ken Exner, chief product officer at Elastic. "People are starting to realize that to do agentic AI correctly, you have to have relevant data," Exner says. "Relevance is critical in the context of agentic AI, because that AI is taking action on your behalf. When people struggle to build AI applications, I can almost guarantee you the problem is relevance.”Agents everywhereThe struggle could be entering a make-or-break period as organizations scramble for competitive edge or to create new efficiencies. A Deloitte study predicts that by 2026, more than 60% of large enterprises will have deployed agentic AI at scale, marking a major increase from experimental phases to mainstream implementation. And researcher Gartner forecasts that by the end of 2026, 40% of all enterprise applications will incorporate task-specific agents, up from less than 5% in 2025. Adding task specialization capabilities evolves AI assistants into context-aware AI agents.Enter context engineeringThe process for getting the relevant context into agents at the right time is known as context engineering. It not only ensures that an agentic application has the data it needs to provide accurate, in-depth responses, it helps the large language model (LLM) understand what tools it needs to find and use that data, and how to call those APIs. While there are now open-source standards such as the Model Context Protocol (MCP) that allow LLMs to connect to and communicate with external data, there are few platforms that let organizations build precise AI agents that use your data and combine retrieval, governance, and orchestration in one place, natively. Elasticsearch has always been a leading platform for the core of context engineering. It recently released a new feature within Elasticsearch called Agent Builder, which simplifies the entire operational lifecycle of agents: development, configuration, execution, customization, and observability.Agent Builder helps build MCP tools on private data using various techniques, including Elasticsearch Query Language, a piped query language for filtering, transforming, and analyzing data, or workflow modeling. Users can then take various tools and combine them with prompts and an LLM to build an agent. Agent Builder offers a configurable, out-of-the-box conversational agent that allows you to chat with the data in the index, and it also gives users the ability to build one from scratch using various tools and prompts on top of private data. "Data is the center of our world at Elastic. We’re trying to make sure that you have the tools you need to put that data to work," Exner explains. "The second you open up Agent Builder, you point it to an index in Elasticsearch, and you can begin chatting with any data you connect this to, any data that’s indexed in Elasticsearch — or from external sources through integrations.”Context engineering as a disciplinePrompt and context engineering is becoming a discipli. It’s not something you need a computer science degree in, but more classes and best practices will emerge, because there’s an art to it. "We want to make it very simple to do that," Exner says. "The thing that people will have to figure out is, how do you drive automation with AI? That’s what’s going to drive productivity. The people who are focused on that will see more success."Beyond that, other context engineering patterns will emerge. The industry has gone from prompt engineering to retrieval-augmented generation, where information is passed to the LLM in a context window, to MCP solutions that help LLMs with tool selection. But it won't stop there."Given how fast things are moving, I will guarantee that new patterns will emerge quite quickly," Exner says. "There will still be context engineering, but they’ll be new patterns for how to share data with an LLM, how to get it to be grounded in the right information. And I predict more patterns that make it possible for the LLM to understand private data that it’s not been trained on."Agent Builder is available now as a tech preview. Get started with an Elastic Cloud Trial, and check out the documentation for Agent Builder here.Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.
Enterprises, eager to ensure any AI models they use adhere to safety and safe-use policies, fine-tune LLMs so they do not respond to unwanted queries. However, much of the safeguarding and red teaming happens before deployment, “baking in” policies before users fully test the models’ capabilities in production. OpenAI believes it can offer a more flexible option for enterprises and encourage more companies to bring in safety policies. The company has released two open-weight models under research preview that it believes will make enterprises and models more flexible in terms of safeguards. gpt-oss-safeguard-120b and gpt-oss-safeguard-20b will be available on a permissive Apache 2.0 license. The models are fine-tuned versions of OpenAI’s open-source gpt-oss, released in August, marking the first release in the oss family since the summer.In a blog post, OpenAI said oss-safeguard uses reasoning “to directly interpret a developer-provider policy at inference time — classifying user messages, completions and full chats according to the developer’s needs.”The company explained that, since the model uses a chain-of-thought (CoT), developers can get explanations of the model's decisions for review. “Additionally, the policy is provided during inference, rather than being trained into the model, so it is easy for developers to iteratively revise policies to increase performance," OpenAI said in its post. "This approach, which we initially developed for internal use, is significantly more flexible than the traditional method of training a classifier to indirectly infer a decision boundary from a large number of labeled examples." Developers can download both models from Hugging Face. Flexibility versus baking inAt the onset, AI models will not know a company’s preferred safety triggers. While model providers do red-team models and platforms, these safeguards are intended for broader use. Companies like Microsoft and Amazon Web Services even offer platforms to bring guardrails to AI applications and agents. Enterprises use safety classifiers to help train a model to recognize patterns of good or bad inputs. This helps the models learn which queries they shouldn’t reply to. It also helps ensure that the models do not drift and answer accurately.“Traditional classifiers can have high performance, with low latency and operating cost," OpenAI said. "But gathering a sufficient quantity of training examples can be time-consuming and costly, and updating or changing the policy requires re-training the classifier."The models takes in two inputs at once before it outputs a conclusion on where the content fails. It takes a policy and the content to classify under its guidelines. OpenAI said the models work best in situations where: The potential harm is emerging or evolving, and policies need to adapt quickly.The domain is highly nuanced and difficult for smaller classifiers to handle.Developers don’t have enough samples to train a high-quality classifier for each risk on their platform.Latency is less important than producing high-quality, explainable labels.The company said gpt-oss-safeguard “is different because its reasoning capabilities allow developers to apply any policy,” even ones they’ve written during inference. The models are based on OpenAI’s internal tool, the Safety Reasoner, which enables its teams to be more iterative in setting guardrails. They often begin with very strict safety policies, “and use relatively large amounts of compute where needed,” then adjust policies as they move the model through production and risk assessments change. Performing safetyOpenAI said the gpt-oss-safeguard models outperformed its GPT-5-thinking and the original gpt-oss models on multipolicy accuracy based on benchmark testing. It also ran the models on the ToxicChat public benchmark, where they performed well, although GPT-5-thinking and the Safety Reasoner slightly edged them out.But there is concern that this approach could bring a centralization of safety standards.“Safety is not a well-defined concept. Any implementation of safety standards will reflect the values and priorities of the organization that creates it, as well as the limits and deficiencies of its models,” said John Thickstun, an assistant professor of computer science at Cornell University. “If industry as a whole adopts standards developed by OpenAI, we risk institutionalizing one particular perspective on safety and short-circuiting broader investigations into the safety needs for AI deployments across many sectors of society.”It should also be noted that OpenAI did not release the base model for the oss family of models, so developers cannot fully iterate on them. OpenAI, however, is confident that the developer community can help refine gpt-oss-safeguard. It will host a Hackathon on December 8 in San Francisco.
Researchers at Nvidia have developed a novel approach to train large language models (LLMs) in 4-bit quantized format while maintaining their stability and accuracy at the level of high-precision models. Their technique, NVFP4, makes it possible to train models that not only outperform other leading 4-bit formats but match the performance of the larger 8-bit FP8 format, all while using half the memory and a fraction of the compute.The success of NVFP4 shows that enterprises can continue to cut inference costs by running leaner models that match the performance of larger ones. It also hints at a future where the cost of training LLMs will drop to a point where many more organizations can train their own bespoke models from scratch rather than just fine-tuning existing ones.The quantization challengeModel quantization is a technique used to reduce the computational and memory costs of running and training AI models. It works by converting the model's parameters, or weights, from high-precision formats like 16- and 32-bit floating point (BF16 and FP32) to lower-precision formats. The key challenge of quantization is to reduce the size of the model while preserving as much of its knowledge and capabilities as possible.In recent years, 8-bit floating point formats (FP8) have become a popular industry standard, offering a good balance between performance and efficiency. They significantly lower the computational cost and memory demand for LLM training without a major drop in accuracy.The next logical step is 4-bit floating point (FP4), which promises to halve memory usage again and further boost performance on advanced hardware. However, this transition has been challenging. Existing 4-bit formats, such as MXFP4, often struggle to maintain the same level of accuracy as their 8-bit counterparts, forcing a difficult trade-off between cost and performance.How NVFP4 worksNVFP4 overcomes the stability and accuracy challenges of other FP4 techniques through a smarter design and a targeted training methodology. A key issue with 4-bit precision is its extremely limited range: It can only represent 16 distinct values. When converting from a high-precision format, outlier values can distort the entire dataset, harming the model's accuracy. NVFP4 uses a more sophisticated, multi-level scaling approach that better handles these outliers, allowing for a "more precise and accurate representation of tensor values during training," according to Nvidia.Beyond the format, the researchers introduce a 4-bit training recipe that achieves accuracy comparable to FP8. A central component is their “mixed-precision strategy.” Instead of converting the entire model to NVFP4, the majority of layers are quantized while a small fraction of numerically sensitive layers are kept in a higher-precision format like BF16. This preserves stability where it matters most. The methodology also adjusts how gradients are calculated during backpropagation — or the model's learning phase — to reduce biases that can accumulate from low-precision arithmetic.NVFP4 in practiceTo test their approach, the Nvidia team trained a powerful 12-billion-parameter hybrid Mamba-Transformer model on a massive 10 trillion tokens. They then compared its performance directly against a baseline model trained in the widely popular FP8 format. The results showed that the NVFP4 model's training loss and downstream task accuracy closely tracked the FP8 version throughout the entire process.The performance held across a wide range of domains, including knowledge-intensive reasoning, mathematics and commonsense tasks, with only a slight drop-off in coding benchmarks in late training."This marks, to our knowledge, the first successful demonstration of training billion-parameter language models with 4-bit precision over a multi-trillion-token horizon, laying the foundation for faster and more efficient training of future frontier models,” the researchers write.According to Nvidia's director of product for AI and data center GPUs NvidiaShar Narasimhan, in practice, NVFP4’s 4-bit precision format enables developers and businesses to train and deploy AI models with nearly the same accuracy as traditional 8-bit formats. “By training model weights directly in 4-bit format while preserving accuracy, it empowers developers to experiment with new architectures, iterate faster and uncover insights without being bottlenecked by resource constraints,” he told VentureBeat. In contrast, FP8 (while already a leap forward from FP16) still imposes limits on model size and inference performance due to higher memory and bandwidth demands. “NVFP4 breaks that ceiling, offering equivalent quality with dramatically greater headroom for growth and experimentation,” Narasimhan said.When compared to the alternative 4-bit format, MXFP4, the benefits of NVFP4 become even clearer. In an experiment with an 8-billion-parameter model, NVFP4 converged to a better loss score than MXFP4. To reach the same level of performance as the NVFP4 model, the MXFP4 model had to be trained on 36% more data, a considerable increase in training time and cost.In addition to making pretraining more efficient, NVFP4 also redefines what’s possible. “Showing that 4-bit precision can preserve model quality at scale opens the door to a future where highly specialized models can be trained from scratch by mid-sized enterprises or startups, not just hyperscalers,” Narasimhan said, adding that, over time, we can expect a shift from developing general purpose LLMs models to “a diverse ecosystem of custom, high-performance models built by a broader range of innovators.”Beyond pre-trainingAlthough the paper focuses on the advantages of NVFP4 during pretraining, its impact extends to inference, as well. “Models trained on NVFP4 can not only deliver faster inference and higher throughput but shorten the time required for AI factories to achieve ROI — accelerating the cycle from model development to real-world deployment,” Narasimhan said. Because these models are smaller and more efficient, they unlock new possibilities for serving complex, high-quality responses in real time, even in token-intensive, agentic applications, without raising energy and compute costs. Narasimhan said he looks toward a future of model efficiency that isn’t solely about pushing precision lower, but building smarter systems. “There are many opportunities to expand research into lower precisions as well as modifying architectures to address the components that increasingly dominate compute in large-scale models,” he said. “These areas are rich with opportunity, especially as we move toward agentic systems that demand high throughput, low latency and adaptive reasoning. NVFP4 proves that precision can be optimized without compromising quality, and it sets the stage for a new era of intelligent, efficient AI design.”
In an industry where model size is often seen as a proxy for intelligence, IBM is charting a different course — one that values efficiency over enormity, and accessibility over abstraction.The 114-year-old tech giant's four new Granite 4.0 Nano models, released today, range from just 350 million to 1.5 billion parameters, a fraction of the size of their server-bound cousins from the likes of OpenAI, Anthropic, and Google. These models are designed to be highly accessible: the 350M variants can run comfortably on a modern laptop CPU with 8–16GB of RAM, while the 1.5B models typically require a GPU with at least 6–8GB of VRAM for smooth performance — or sufficient system RAM and swap for CPU-only inference. This makes them well-suited for developers building applications on consumer hardware or at the edge, without relying on cloud compute.In fact, the smallest ones can even run locally on your own web browser, as Joshua Lochner aka Xenova, creator of Transformer.js and a machine learning engineer at Hugging Face, wrote on the social network X.All the Granite 4.0 Nano models are released under the Apache 2.0 license — perfect for use by researchers and enterprise or indie developers, even for commercial usage. They are natively compatible with llama.cpp, vLLM, and MLX and are certified under ISO 42001 for responsible AI development — a standard IBM helped pioneer.But in this case, small doesn't mean less capable — it might just mean smarter design.These compact models are built not for data centers, but for edge devices, laptops, and local inference, where compute is scarce and latency matters. And despite their small size, the Nano models are showing benchmark results that rival or even exceed the performance of larger models in the same category. The release is a signal that a new AI frontier is rapidly forming — one not dominated by sheer scale, but by strategic scaling.What Exactly Did IBM Release?The Granite 4.0 Nano family includes four open-source models now available on Hugging Face:Granite-4.0-H-1B (~1.5B parameters) – Hybrid-SSM architectureGranite-4.0-H-350M (~350M parameters) – Hybrid-SSM architectureGranite-4.0-1B – Transformer-based variant, parameter count closer to 2BGranite-4.0-350M – Transformer-based variantThe H-series models — Granite-4.0-H-1B and H-350M — use a hybrid state space architecture (SSM) that combines efficiency with strong performance, ideal for low-latency edge environments. Meanwhile, the standard transformer variants — Granite-4.0-1B and 350M — offer broader compatibility with tools like llama.cpp, designed for use cases where hybrid architecture isn’t yet supported. In practice, the transformer 1B model is closer to 2B parameters, but aligns performance-wise with its hybrid sibling, offering developers flexibility based on their runtime constraints.“The hybrid variant is a true 1B model. However, the non-hybrid variant is closer to 2B, but we opted to keep the naming aligned to the hybrid variant to make the connection easily visible,” explained Emma, Product Marketing lead for Granite, during a Reddit "Ask Me Anything" (AMA) session on r/LocalLLaMA.A Competitive Class of Small ModelsIBM is entering a crowded and rapidly evolving market of small language models (SLMs), competing with offerings like Qwen3, Google's Gemma, LiquidAI’s LFM2, and even Mistral’s dense models in the sub-2B parameter space.While OpenAI and Anthropic focus on models that require clusters of GPUs and sophisticated inference optimization, IBM’s Nano family is aimed squarely at developers who want to run performant LLMs on local or constrained hardware.In benchmark testing, IBM’s new models consistently top the charts in their class. According to data shared on X by David Cox, VP of AI Models at IBM Research:On IFEval (instruction following), Granite-4.0-H-1B scored 78.5, outperforming Qwen3-1.7B (73.1) and other 1–2B models.On BFCLv3 (function/tool calling), Granite-4.0-1B led with a score of 54.8, the highest in its size class.On safety benchmarks (SALAD and AttaQ), the Granite models scored over 90%, surpassing similarly sized competitors.Overall, the Granite-4.0-1B achieved a leading average benchmark score of 68.3% across general knowledge, math, code, and safety domains.This performance is especially significant given the hardware constraints these models are designed for. They require less memory, run faster on CPUs or mobile devices, and don’t need cloud infrastructure or GPU acceleration to deliver usable results.Why Model Size Still Matters — But Not Like It Used ToIn the early wave of LLMs, bigger meant better — more parameters translated to better generalization, deeper reasoning, and richer output. But as transformer research matured, it became clear that architecture, training quality, and task-specific tuning could allow smaller models to punch well above their weight class.IBM is banking on this evolution. By releasing open, small models that are competitive in real-world tasks, the company is offering an alternative to the monolithic AI APIs that dominate today’s application stack.In fact, the Nano models address three increasingly important needs:Deployment flexibility — they run anywhere, from mobile to microservers.Inference privacy — users can keep data local with no need to call out to cloud APIs.Openness and auditability — source code and model weights are publicly available under an open license.Community Response and Roadmap SignalsIBM’s Granite team didn’t just launch the models and walk away — they took to Reddit’s open source community r/LocalLLaMA to engage directly with developers. In an AMA-style thread, Emma (Product Marketing, Granite) answered technical questions, addressed concerns about naming conventions, and dropped hints about what’s next.Notable confirmations from the thread:A larger Granite 4.0 model is currently in trainingReasoning-focused models ("thinking counterparts") are in the pipelineIBM will release fine-tuning recipes and a full training paper soonMore tooling and platform compatibility is on the roadmapUsers responded enthusiastically to the models’ capabilities, especially in instruction-following and structured response tasks. One commenter summed it up:“This is big if true for a 1B model — if quality is nice and it gives consistent outputs. Function-calling tasks, multilingual dialog, FIM completions… this could be a real workhorse.”Another user remarked:“The Granite Tiny is already my go-to for web search in LM Studio — better than some Qwen models. Tempted to give Nano a shot.”Background: IBM Granite and the Enterprise AI RaceIBM’s push into large language models began in earnest in late 2023 with the debut of the Granite foundation model family, starting with models like Granite.13b.instruct and Granite.13b.chat. Released for use within its Watsonx platform, these initial decoder-only models signaled IBM’s ambition to build enterprise-grade AI systems that prioritize transparency, efficiency, and performance. The company open-sourced select Granite code models under the Apache 2.0 license in mid-2024, laying the groundwork for broader adoption and developer experimentation.The real inflection point came with Granite 3.0 in October 2024 — a fully open-source suite of general-purpose and domain-specialized models ranging from 1B to 8B parameters. These models emphasized efficiency over brute scale, offering capabilities like longer context windows, instruction tuning, and integrated guardrails. IBM positioned Granite 3.0 as a direct competitor to Meta’s Llama, Alibaba’s Qwen, and Google's Gemma — but with a uniquely enterprise-first lens. Later versions, including Granite 3.1 and Granite 3.2, introduced even more enterprise-friendly innovations: embedded hallucination detection, time-series forecasting, document vision models, and conditional reasoning toggles.The Granite 4.0 family, launched in October 2025, represents IBM’s most technically ambitious release yet. It introduces a hybrid architecture that blends transformer and Mamba-2 layers — aiming to combine the contextual precision of attention mechanisms with the memory efficiency of state-space models. This design allows IBM to significantly reduce memory and latency costs for inference, making Granite models viable on smaller hardware while still outperforming peers in instruction-following and function-calling tasks. The launch also includes ISO 42001 certification, cryptographic model signing, and distribution across platforms like Hugging Face, Docker, LM Studio, Ollama, and watsonx.ai.Across all iterations, IBM’s focus has been clear: build trustworthy, efficient, and legally unambiguous AI models for enterprise use cases. With a permissive Apache 2.0 license, public benchmarks, and an emphasis on governance, the Granite initiative not only responds to rising concerns over proprietary black-box models but also offers a Western-aligned open alternative to the rapid progress from teams like Alibaba’s Qwen. In doing so, Granite positions IBM as a leading voice in what may be the next phase of open-weight, production-ready AI.A Shift Toward Scalable EfficiencyIn the end, IBM’s release of Granite 4.0 Nano models reflects a strategic shift in LLM development: from chasing parameter count records to optimizing usability, openness, and deployment reach.By combining competitive performance, responsible development practices, and deep engagement with the open-source community, IBM is positioning Granite as not just a family of models — but a platform for building the next generation of lightweight, trustworthy AI systems.For developers and researchers looking for performance without overhead, the Nano release offers a compelling signal: you don’t need 70 billion parameters to build something powerful — just the right ones.
Microsoft is launching a significant expansion of its Copilot AI assistant on Tuesday, introducing tools that let employees build applications, automate workflows, and create specialized AI agents using only conversational prompts — no coding required.The new capabilities, called App Builder and Workflows, mark Microsoft's most aggressive attempt yet to merge artificial intelligence with software development, enabling the estimated 100 million Microsoft 365 users to create business tools as easily as they currently draft emails or build spreadsheets."We really believe that a main part of an AI-forward employee, not just developers, will be to create agents, workflows and apps," Charles Lamanna, Microsoft's president of business and industry Copilot, said in an interview with VentureBeat. "Part of the job will be to build and create these things."The announcement comes as Microsoft deepens its commitment to AI-powered productivity tools while navigating a complex partnership with OpenAI, the creator of the underlying technology that powers Copilot. On the same day, OpenAI completed its restructuring into a for-profit entity, with Microsoft receiving a 27% ownership stake valued at approximately $135 billion.How natural language prompts now create fully functional business applicationsThe new features transform Copilot from a conversational assistant into what Microsoft envisions as a comprehensive development environment accessible to non-technical workers. Users can now describe an application they need — such as a project tracker with dashboards and task assignments — and Copilot will generate a working app complete with a database backend, user interface, and security controls."If you're right inside of Copilot, you can now have a conversation to build an application complete with a backing database and a security model," Lamanna explained. "You can make edit requests and update requests and change requests so you can tune the app to get exactly the experience you want before you share it with other users."The App Builder stores data in Microsoft Lists, the company's lightweight database system, and allows users to share finished applications via a simple link—similar to sharing a document. The Workflows agent, meanwhile, automates routine tasks across Microsoft's ecosystem of products, including Outlook, Teams, SharePoint, and Planner, by converting natural language descriptions into automated processes.A third component, a simplified version of Microsoft's Copilot Studio agent-building platform, lets users create specialized AI assistants tailored to specific tasks or knowledge domains, drawing from SharePoint documents, meeting transcripts, emails, and external systems.All three capabilities are included in the existing $30-per-month Microsoft 365 Copilot subscription at no additional cost — a pricing decision Lamanna characterized as consistent with Microsoft's historical approach of bundling significant value into its productivity suite."That's what Microsoft always does. We try to do a huge amount of value at a low price," he said. "If you go look at Office, you think about Excel, Word, PowerPoint, Exchange, all that for like eight bucks a month. That's a pretty good deal."Why Microsoft's nine-year bet on low-code development is finally paying offThe new tools represent the culmination of a nine-year effort by Microsoft to democratize software development through its Power Platform — a collection of low-code and no-code development tools that has grown to 56 million monthly active users, according to figures the company disclosed in recent earnings reports.Lamanna, who has led the Power Platform initiative since its inception, said the integration into Copilot marks a fundamental shift in how these capabilities reach users. Rather than requiring workers to visit a separate website or learn a specialized interface, the development tools now exist within the same conversational window they already use for AI-assisted tasks."One of the big things that we're excited about is Copilot — that's a tool for literally every office worker," Lamanna said. "Every office worker, just like they research data, they analyze data, they reason over topics, they also will be creating apps, agents and workflows."The integration offers significant technical advantages, he argued. Because Copilot already indexes a user's Microsoft 365 content — emails, documents, meetings, and organizational data — it can incorporate that context into the applications and workflows it builds. If a user asks for "an app for Project Spartan," Copilot can draw from existing communications to understand what that project entails and suggest relevant features."If you go to those other tools, they have no idea what the heck Project Spartan is," Lamanna said, referencing competing low-code platforms from companies like Google, Salesforce, and ServiceNow. "But if you do it inside of Copilot and inside of the App Builder, it's able to draw from all that information and context."Microsoft claims the apps created through these tools are "full-stack applications" with proper databases secured through the same identity systems used across its enterprise products — distinguishing them from simpler front-end tools offered by competitors. The company also emphasized that its existing governance, security, and data loss prevention policies automatically apply to apps and workflows created through Copilot.Where professional developers still matter in an AI-powered workplaceWhile Microsoft positions the new capabilities as accessible to all office workers, Lamanna was careful to delineate where professional developers remain essential. His dividing line centers on whether a system interacts with parties outside the organization."Anything that leaves the boundaries of your company warrants developer involvement," he said. "If you want to build an agent and put it on your website, you should have developers involved. Or if you want to build an automation which interfaces directly with your customers, or an app or a website which interfaces directly with your customers, you want professionals involved."The reasoning is risk-based: external-facing systems carry greater potential for data breaches, security vulnerabilities, or business errors. "You don't want people getting refunds they shouldn't," Lamanna noted.For internal use cases — approval workflows, project tracking, team dashboards — Microsoft believes the new tools can handle the majority of needs without IT department involvement. But the company has built "no cliffs," in Lamanna's terminology, allowing users to migrate simple apps to more sophisticated platforms as needs grow.Apps created in the conversational App Builder can be opened in Power Apps, Microsoft's full development environment, where they can be connected to Dataverse, the company's enterprise database, or extended with custom code. Similarly, simple workflows can graduate to the full Power Automate platform, and basic agents can be enhanced in the complete Copilot Studio."We have this mantra called no cliffs," Lamanna said. "If your app gets too complicated for the App Builder, you can always edit and open it in Power Apps. You can jump over to the richer experience, and if you're really sophisticated, you can even go from those experiences into Azure."This architecture addresses a problem that has plagued previous generations of easy-to-use development tools: users who outgrow the simplified environment often must rebuild from scratch on professional platforms. "People really do not like easy-to-use development tools if I have to throw everything away and start over," Lamanna said.What happens when every employee can build apps without IT approvalThe democratization of software development raises questions about governance, maintenance, and organizational complexity — issues Microsoft has worked to address through administrative controls.IT administrators can view all applications, workflows, and agents created within their organization through a centralized inventory in the Microsoft 365 admin center. They can reassign ownership, disable access at the group level, or "promote" particularly useful employee-created apps to officially supported status."We have a bunch of customers who have this approach where it's like, let 1,000 apps bloom, and then the best ones, I go upgrade and make them IT-governed or central," Lamanna said.The system also includes provisions for when employees leave. Apps and workflows remain accessible for 60 days, during which managers can claim ownership — similar to how OneDrive files are handled when someone departs.Lamanna argued that most employee-created apps don't warrant significant IT oversight. "It's just not worth inspecting an app that John, Susie, and Bob use to do their job," he said. "It should concern itself with the app that ends up being used by 2,000 people, and that will pop up in that dashboard."Still, the proliferation of employee-created applications could create challenges. Users have expressed frustration with Microsoft's increasing emphasis on AI features across its products, with some giving the Microsoft 365 mobile app one-star ratings after a recent update prioritized Copilot over traditional file access.The tools also arrive as enterprises grapple with "shadow IT" — unsanctioned software and systems that employees adopt without official approval. While Microsoft's governance controls aim to provide visibility, the ease of creating new applications could accelerate the pace at which these systems multiply.The ambitious plan to turn 500 million workers into software buildersMicrosoft's ambitions for the technology extend far beyond incremental productivity gains. Lamanna envisions a fundamental transformation of what it means to be an office worker — one where building software becomes as routine as creating spreadsheets."Just like how 20 years ago you put on your resume that you could use pivot tables in Excel, people are going to start saying that they can use App Builder and workflow agents, even if they're just in the finance department or the sales department," he said.The numbers he's targeting are staggering. With 56 million people already using Power Platform, Lamanna believes the integration into Copilot could eventually reach 500 million builders. "Early days still, but I think it's certainly encouraging," he said.The features are currently available only to customers in Microsoft's Frontier Program — an early access initiative for Microsoft 365 Copilot subscribers. The company has not disclosed how many organizations participate in the program or when the tools will reach general availability.The announcement fits within Microsoft's larger strategy of embedding AI capabilities throughout its product portfolio, driven by its partnership with OpenAI. Under the restructured agreement announced Tuesday, Microsoft will have access to OpenAI's technology through 2032, including models that achieve artificial general intelligence (AGI) — though such systems do not yet exist. Microsoft has also begun integrating Copilot into its new companion apps for Windows 11, which provide quick access to contacts, files, and calendar information.The aggressive integration of AI features across Microsoft's ecosystem has drawn mixed reactions. While enterprise customers have shown interest in productivity gains, the rapid pace of change and ubiquity of AI prompts have frustrated some users who prefer traditional workflows.For Microsoft, however, the calculation is clear: if even a fraction of its user base begins creating applications and automations, it would represent a massive expansion of the effective software development workforce — and further entrench customers in Microsoft's ecosystem. The company is betting that the same natural language interface that made ChatGPT accessible to millions can finally unlock the decades-old promise of empowering everyday workers to build their own tools.The App Builder and Workflows agents are available starting today through the Microsoft 365 Copilot Agent Store for Frontier Program participants.Whether that future arrives depends not just on the technology's capabilities, but on a more fundamental question: Do millions of office workers actually want to become part-time software developers? Microsoft is about to find out if the answer is yes — or if some jobs are better left to the professionals.
Can I use NumPy to figure out how my habits affect my mood and productivity?
The post Using NumPy to Analyze My Daily Habits (Sleep, Screen Time & Mood) appeared first on Towards Data Science.
In this post, we explore how to deploy NVIDIA's Parakeet ASR model on Amazon SageMaker AI using asynchronous inference endpoints to create a scalable, cost-effective pipeline for processing large volumes of audio data. The solution combines state-of-the-art speech recognition capabilities with AWS managed services like Lambda, S3, and Bedrock to automatically transcribe audio files and generate intelligent summaries, enabling organizations to unlock valuable insights from customer calls, meeting recordings, and other audio content at scale .
Using RL to teach robots to fly a drone
The post Deep Reinforcement Learning: 0 to 100 appeared first on Towards Data Science.
A hands-on exploration of Claude Skills and their potential applications in Neo4j
The post Using Claude Skills with Neo4j appeared first on Towards Data Science.
Learn more about the startups chosen for Google for Startups Accelerator: AI for Cybersecurity.
Application programming interfaces are essential for modern web applications and data products. They allow different systems to communicate with each other and share data securely.
Language models can generate text and reason impressively, yet they remain isolated by default.
GitHub is making a bold bet that enterprises don't need another proprietary coding agent: They need a way to manage all of them.At its Universe 2025 conference, the Microsoft-owned developer platform announced Agent HQ. The new architecture transforms GitHub into a unified control plane for managing multiple AI coding agents from competitors including Anthropic, OpenAI, Google, Cognition and xAI. Rather than forcing developers into a single agent experience, the company is positioning itself as the essential orchestration layer beneath them all.Agent HQ represents GitHub's attempt to apply its collaboration platform approach to AI agents. Just as the company transformed Git, pull requests and CI/CD into collaborative workflows, it's now trying to do the same with a fragmented AI coding landscape.The announcement marks what GitHub calls the transition from "wave one" to "wave two" of AI-assisted development. According to GitHub's Octoverse report, 80% of new developers use Copilot in their first week and AI has helped to lead to a large increase overall in the use of the GitHub platform."Last year, the big announcements for us, and what we were saying as a company, is wave one is done, that was kind of code completion," GitHub's COO Mario Rodriguez told VentureBeat. "We're into this wave two era, [which] is going to be multimodal, it's going to be agentic and it's going to have these new experiences that will feel AI native."What is Agent HQ?GitHub already updated its GitHub Copilot coding tool for the agentic era with the debut of GitHub Copilot Agent in May.Agent HQ transforms GitHub into an open ecosystem that unites multiple AI coding agents on a single platform. Over the coming months, coding agents from Anthropic, OpenAI, Google, Cognition, xAI and others will become available directly within GitHub as part of existing paid GitHub Copilot subscriptions.The architecture maintains GitHub's core primitives. Developers still work with Git, pull requests and issues. They still use their preferred compute, whether GitHub Actions or self-hosted runners. What changes is the layer above: agents from multiple vendors can now operate within GitHub's security perimeter, using the same identity controls, branch permissions and audit logging that enterprises already trust for human developers.This approach differs fundamentally from standalone tools. When developers use Cursor or grant repository access to Claude, those agents typically receive broad permissions across entire repositories. Agent HQ compartmentalizes access at the branch level and wraps all agent activity in enterprise-grade governance controls.Mission Control: One interface for all agentsAt the heart of Agent HQ is Mission Control. It's a unified command center that appears consistently across GitHub's web interface, VS Code, mobile apps and the command line. Through Mission Control, developers can assign work to multiple agents simultaneously. They can track progress and manage permissions, all from a single pane of glass.The technical architecture addresses a critical enterprise concern: Security. Unlike standalone agent implementations where users must grant broad repository access, GitHub's Agent HQ implements granular controls at the platform level."Our coding agent has a set of security controls and capabilities that are built natively into the platform, and that's what we're providing to all of these other agents as well," Rodriguez explained. "It runs with a GitHub token that is very locked down to what it can actually do."Agents operating through Agent HQ can only commit to designated branches. They run within sandboxed GitHub Actions environments with firewall protections. They operate under strict identity controls. Rodriguez explained that even if an agent goes rogue, the firewall prevents it from accessing external networks or exfiltrating data unless those protections are explicitly disabled.Technical differentiation: MCP integration and custom agentsBeyond managing third-party agents, GitHub is introducing two technical capabilities that set Agent HQ apart from alternative approaches like Cursor's standalone editor or Anthropic's Claude integration.Custom agents via AGENTS.md files: Enterprises can now create source-controlled configuration files that define specific rules, tools and guardrails for how Copilot behaves. For example, a company could specify "prefer this logger" or "use table-driven tests for all handlers." This permanently encodes organizational standards without requiring developers to re-prompt every time."Custom agents have an immense amount of product market fit within enterprises, because they could just codify a set of skills that the coordination can do, then standardize on those and get really high quality output," Rodriguez said.The AGENTS.md specification allows teams to version control their agent behavior alongside their code. When a developer clones a repository, they automatically inherit the custom agent rules. This solves a persistent problem with AI coding tools: Inconsistent output quality when different team members use different prompting strategies.Native Model Context Protocol (MCP) support: VS Code now includes a GitHub MCP Registry. Developers can discover, install and enable MCP servers with a single click. They can then create custom agents that combine these tools with specific system prompts.This positions GitHub as the integration point between the emerging MCP ecosystem and actual developer workflows. MCP, introduced by Anthropic but rapidly gaining industry support, is becoming a de facto standard for agent-to-tool communication. By supporting the full specification, GitHub can orchestrate agents that need access to external services without each agent implementing its own integration logic.Plan Mode and agentic code reviewGitHub is also shipping new capabilities within VS Code itself. Plan Mode allows developers to collaborate with Copilot on building step-by-step project approaches. The AI asks clarifying questions before any code is written. Once approved, the plan can be executed either locally in VS Code or by cloud-based agents.The feature addresses a common failure mode in AI coding: Beginning implementation before requirements are fully understood. By forcing an explicit planning phase, GitHub aims to reduce wasted effort and improve output quality.More significantly, GitHub's code review feature is becoming agentic. The new implementation will use GitHub's CodeQL engine, which previously largely focused on security vulnerabilities to identify bugs and maintainability issues. The code review agent will automatically scan agent-generated pull requests before human review. This creates a two-stage quality gate."Our code review agent will be able to make calls into the CodeQL engine to then find a set of bugs," Rodriguez explained. "We're extending the engine and we're going to be able to tap into that engine also to find bugs."Enterprise considerations: What to do nowFor enterprises already deploying multiple AI coding tools, Agent HQ offers a path to consolidation without forcing tool elimination.GitHub's multi-agent approach provides vendor flexibility and reduces lock-in risk. Organizations can test multiple agents within a unified security perimeter and switch providers without retraining developers. The tradeoff is potentially less optimized experiences compared to specialized tools that tightly integrate UI and agent behavior.Rodriguez's recommendation is clear: Begin with custom agents. This allows enterprises to codify organizational standards that agents follow consistently. Once established, organizations can layer in additional third-party agents to expand capabilities."Go and do agent coding, custom agents and start playing with that," he said. "That is a capability available tomorrow, and it allows you to really start shaping your SDLC to be personalized to you, your organization and your people."
Understanding how AI models “reason” and why it’s not what humans do when we think
The post Water Cooler Small Talk, Ep. 9: What “Thinking” and “Reasoning” Really Mean in AI and LLMs appeared first on Towards Data Science.
Skip local MCP setup headaches and connect via a simple URL and API key to free, remote MCP servers that are faster, more capable, and unify planning, design, coding, and research in one seamless workflow.
Building AI for financial software requires a different playbook than consumer AI, and Intuit's latest QuickBooks release provides an example.The company has announced Intuit Intelligence, a system that orchestrates specialized AI agents across its QuickBooks platform to handle tasks including sales tax compliance and payroll processing. These new agents augment existing accounting and project management agents (which have also been updated) as well as a unified interface that lets users query data across QuickBooks, third-party systems and uploaded files using natural language. The new development follow years of investment and improvement in Intuit's GenOS, allowing the company to build AI capabilities that reduce latency and improve accuracy.But the real news isn't what Intuit built — it's how they built it and why their design decisions will make AI more usable. The company's latest AI rollout represents an evolution built on hard-won lessons about what works and what doesn't when deploying AI in financial contexts.What the company learned is sobering: Even when its accounting agent improved transaction categorization accuracy by 20 percentage points on average, they still received complaints about errors."The use cases that we're trying to solve for customers include tax and finance; if you make a mistake in this world, you lose trust with customers in buckets and we only get it back in spoonfuls," Joe Preston, Intuit's VP of product and design, told VentureBeat.The architecture of trust: Real data queries over generative responsesIntuit's technical strategy centers on a fundamental design decision. For financial queries and business intelligence, the system queries actual data, rather than generating responses through large language models (LLMs).Also critically important: That data isn't all in one place. Intuit's technical implementation allows QuickBooks to ingest data from multiple distinct sources: native Intuit data, OAuth-connected third-party systems like Square for payments and user-uploaded files such as spreadsheets containing vendor pricing lists or marketing campaign data. This creates a unified data layer that AI agents can query reliably."We're actually querying your real data," Preston explained. "That's very different than if you were to just copy, paste out a spreadsheet or a PDF and paste into ChatGPT."This architectural choice means that the Intuit Intelligence system functions more as an orchestration layer. It's a natural language interface to structured data operations. When a user asks about projected profitability or wants to run payroll, the system translates the natural language query into database operations against verified financial data.This matters because Intuit's internal research has uncovered widespread shadow AI usage. When surveyed, 25% of accountants using QuickBooks admitted they were already copying and pasting data into ChatGPT or Google Gemini for analysis.Intuit's approach treats AI as a query translation and orchestration mechanism, not a content generator. This reduces the hallucination risk that has plagued AI deployments in financial contexts.Explainability as a design requirement, not an afterthoughtBeyond the technical architecture, Intuit has made explainability a core user experience across its AI agents. This goes beyond simply providing correct answers: It means showing users the reasoning behind automated decisions.When Intuit's accounting agent categorizes a transaction, it doesn't just display the result; it shows the reasoning. This isn't marketing copy about explainable AI, it's actual UI displaying data points and logic."It's about closing that trust loop and making sure customers understand the why," Alastair Simpson, Intuit's VP of design, told VentureBeat.This becomes particularly critical when you consider Intuit's user research: While half of small businesses describe AI as helpful, nearly a quarter haven't used AI at all. The explanation layer serves both populations: Building confidence for newcomers, while giving experienced users the context to verify accuracy.The design also enforces human control at critical decision points. This approach extends beyond the interface. Intuit connects users directly with human experts, embedded in the same workflows, when automation reaches its limits or when users want validation.Navigating the transition from forms to conversationsOne of Intuit's more interesting challenges involves managing a fundamental shift in user interfaces. Preston described it as having one foot in the past and one foot in the future."This isn't just Intuit, this is the market as a whole," said Preston. "Today we still have a lot of customers filling out forms and going through tables full of data. We're investing a lot into leaning in and questioning the ways that we do it across our products today, where you're basically just filling out, form after form, or table after table, because we see where the world is headed, which is really a different form of interacting with these products."This creates a product design challenge: How do you serve users who are comfortable with traditional interfaces while gradually introducing conversational and agentic capabilities?Intuit's approach has been to embed AI agents directly into existing workflows. This means not forcing users to adopt entirely new interaction patterns. The payments agent appears alongside invoicing workflows; the accounting agent enhances the existing reconciliation process rather than replacing it. This incremental approach lets users experience AI benefits without abandoning familiar processes.What enterprise AI builders can learn from Intuit's approachIntuit's experience deploying AI in financial contexts surfaces several principles that apply broadly to enterprise AI initiatives. Architecture matters for trust: In domains where accuracy is critical, consider whether you need content generation or data query translation. Intuit's decision to treat AI as an orchestration and natural language interface layer dramatically reduces hallucination risk and avoids using AI as a generative system.Explainability must be designed in, not bolted on: Showing users why the AI made a decision isn't optional when trust is at stake. This requires deliberate UX design. It may constrain model choices.User control preserves trust during accuracy improvements: Intuit's accounting agent improved categorization accuracy by 20 percentage points. Yet, maintaining user override capabilities was essential for adoption.Transition gradually from familiar interfaces: Don't force users to abandon forms for conversations. Embed AI capabilities into existing workflows first. Let users experience benefits before asking them to change behavior.Be honest about what's reactive versus proactive: Current AI agents primarily respond to prompts and automate defined tasks. True proactive intelligence that makes unprompted strategic recommendations remains an evolving capability. Address workforce concerns with tooling, not just messaging: If AI is meant to augment rather than replace workers, provide workers with AI tools. Show them how to leverage the technology.For enterprises navigating AI adoption, Intuit's journey offers a clear directive. The winning approach prioritizes trustworthiness over capability demonstrations. In domains where mistakes have real consequences, that means investing in accuracy, transparency and human oversight before pursuing conversational sophistication or autonomous action.Simpson frames the challenge succinctly: "We didn't want it to be a bolted-on layer. We wanted customers to be in their natural workflow, and have agents doing work for customers, embedded in the workflow."
At Adobe’s annual MAX conference, the company also teased a ChatGPT integration and a new AI assistant in Photoshop.
This is the final part of a three-part series by Markus Eisele. Part 1 can be found here, and Part 2 here. In the first article we looked at the Java developer’s dilemma: the gap between flashy prototypes and the reality of enterprise production systems. In the second article we explored why new types of […]
Mustafa Suleyman, CEO of Microsoft AI, is trying to walk a fine line. On the one hand, he thinks that the industry is taking AI in a dangerous direction by building chatbots that present as human: He worries that people will be tricked into seeing life instead of lifelike behavior. In August, he published a…
4% accuracy jump, AI bubble playbook, Grokipedia, 4× faster typing, and more...
Researchers at Tsinghua University developed the Optical Feature Extraction Engine (OFE2), an optical engine that processes data at 12.5 GHz using light rather than electricity. Its integrated diffraction and data preparation modules enable unprecedented speed and efficiency for AI tasks. Demonstrations in imaging and trading showed improved accuracy, lower latency, and reduced power demand. This innovation pushes optical computing toward real-world, high-performance AI.