Latest AI News & Updates

#ai #data infrastructure #enterprise #cloud

China is on track to dominate consumer artificial intelligence applications and robotics manufacturing within years, but the United States will maintain its substantial lead in enterprise AI adoption and cutting-edge research, according to Kai-Fu Lee, one of the world's most prominent AI scientists and investors.In a rare, unvarnished assessment delivered via video link from Beijing to the TED AI conference in San Francisco Tuesday, Lee — a former executive at Apple, Microsoft, and Google who now runs both a major venture capital firm and his own AI company — laid out a technology landscape splitting along geographic and economic lines, with profound implications for both commercial competition and national security."China's robotics has the advantage of having integrated AI into much lower costs, better supply chain and fast turnaround, so companies like Unitree are actually the farthest ahead in the world in terms of building affordable, embodied humanoid AI," Lee said, referring to a Chinese robotics manufacturer that has undercut Western competitors on price while advancing capabilities.The comments, made to a room filled with Silicon Valley executives, investors, and researchers, represented one of the most detailed public assessments from Lee about the comparative strengths and weaknesses of the world's two AI superpowers — and suggested that the race for artificial intelligence leadership is becoming less a single contest than a series of parallel competitions with different winners.Why venture capital is flowing in opposite directions in the U.S. and ChinaAt the heart of Lee's analysis lies a fundamental difference in how capital flows in the two countries' innovation ecosystems. American venture capitalists, Lee said, are pouring money into generative AI companies building large language models and enterprise software, while Chinese investors are betting heavily on robotics and hardware."The VCs in the US don't fund robotics the way the VCs do in China," Lee said. "Just like the VCs in China don't fund generative AI the way the VCs do in the US."This investment divergence reflects different economic incentives and market structures. In the United States, where companies have grown accustomed to paying for software subscriptions and where labor costs are high, enterprise AI tools that boost white-collar productivity command premium prices. In China, where software subscription models have historically struggled to gain traction but manufacturing dominates the economy, robotics offers a clearer path to commercialization.The result, Lee suggested, is that each country is pulling ahead in different domains — and may continue to do so."China's got some challenges to overcome in getting a company funded as well as OpenAI or Anthropic," Lee acknowledged, referring to the leading American AI labs. "But I think U.S., on the flip side, will have trouble developing the investment interest and value creation in the robotics" sector.Why American companies dominate enterprise AI while Chinese firms struggle with subscriptionsLee was explicit about one area where the United States maintains what appears to be a durable advantage: getting businesses to actually adopt and pay for AI software."The enterprise adoption will clearly be led by the United States," Lee said. "The Chinese companies have not yet developed a habit of paying for software on a subscription."This seemingly mundane difference in business culture — whether companies will pay monthly fees for software — has become a critical factor in the AI race. The explosion of spending on tools like GitHub Copilot, ChatGPT Enterprise, and other AI-powered productivity software has fueled American companies' ability to invest billions in further research and development.Lee noted that China has historically overcome similar challenges in consumer technology by developing alternative business models. "In the early days of internet software, China was also well behind because people weren't willing to pay for software," he said. "But then advertising models, e-commerce models really propelled China forward."Still, he suggested, someone will need to "find a new business model that isn't just pay per software per use or per month basis. That's going to not happen in China anytime soon."The implication: American companies building enterprise AI tools have a window — perhaps a substantial one — where they can generate revenue and reinvest in R&D without facing serious Chinese competition in their core market.How ByteDance, Alibaba and Tencent will outpace Meta and Google in consumer AIWhere Lee sees China pulling ahead decisively is in consumer-facing AI applications — the kind embedded in social media, e-commerce, and entertainment platforms that billions of people use daily."In terms of consumer usage, that's likely to happen," Lee said, referring to China matching or surpassing the United States in AI deployment. "The Chinese giants, like ByteDance and Alibaba and Tencent, will definitely move a lot faster than their equivalent in the United States, companies like Meta, YouTube and so on."Lee pointed to a cultural advantage: Chinese technology companies have spent the past decade obsessively optimizing for user engagement and product-market fit in brutally competitive markets. "The Chinese giants really work tenaciously, and they have mastered the art of figuring out product market fit," he said. "Now they have to add technology to it. So that is inevitably going to happen."This assessment aligns with recent industry observations. ByteDance's TikTok became the world's most downloaded app through sophisticated AI-driven content recommendation, and Chinese companies have pioneered AI-powered features in areas like live-streaming commerce and short-form video that Western companies later copied.Lee also noted that China has already deployed AI more widely in certain domains. "There are a lot of areas where China has also done a great job, such as using computer vision, speech recognition, and translation more widely," he said.The surprising open-source shift that has Chinese models beating Meta's LlamaPerhaps Lee's most striking data point concerned open-source AI development — an area where China appears to have seized leadership from American companies in a remarkably short time."The 10 highest rated open source [models] are from China," Lee said. "These companies have now eclipsed Meta's Llama, which used to be number one."This represents a significant shift. Meta's Llama models were widely viewed as the gold standard for open-source large language models as recently as early 2024. But Chinese companies — including Lee's own firm, 01.AI, along with Alibaba, Baidu, and others — have released a flood of open-source models that, according to various benchmarks, now outperform their American counterparts.The open-source question has become a flashpoint in AI development. Lee made an extensive case for why open-source models will prove essential to the technology's future, even as closed models from companies like OpenAI command higher prices and, often, superior performance."I think open source has a number of major advantages," Lee argued. With open-source models, "you can examine it, tune it, improve it. It's yours, and it's free, and it's important for building if you want to build an application or tune the model to do something specific."He drew an analogy to operating systems: "People who work in operating systems loved Linux, and that's why its adoption went through the roof. And I think in the future, open source will also allow people to tune a sovereign model for a country, make it work better for a particular language."Still, Lee predicted both approaches will coexist. "I don't think open source models will win," he said. "I think just like we have Apple, which is closed, but provides a somewhat better experience than Android... I think we're going to see more apps using open-source models, more engineers wanting to build open-source models, but I think more money will remain in the closed model."Why China's manufacturing advantage makes the robotics race 'not over, but' nearly decidedOn robotics, Lee's message was blunt: the combination of China's manufacturing prowess, lower costs, and aggressive investment has created an advantage that will be difficult for American companies to overcome.When asked directly whether the robotics race was already over with China victorious, Lee hedged only slightly. "It's not over, but I think the U.S. is still capable of coming up with the best robotic research ideas," he said. "But the VCs in the U.S. don't fund robotics the way the VCs do in China."The challenge is structural. Building robots requires not just software and AI, but hardware manufacturing at scale — precisely the kind of integrated supply chain and low-cost production that China has spent decades perfecting. While American labs at universities and companies like Boston Dynamics continue to produce impressive research prototypes, turning those prototypes into affordable commercial products requires the manufacturing ecosystem that China possesses.Companies like Unitree have demonstrated this advantage concretely. The company's humanoid robots and quadrupedal robots cost a fraction of their American-made equivalents while offering comparable or superior capabilities — a price-to-performance ratio that could prove decisive in commercial markets.What worries Lee most: not AGI, but the race itselfDespite his generally measured tone about China's AI development, Lee expressed concern about one area where he believes the global AI community faces real danger — not the far-future risk of superintelligent AI, but the near-term consequences of moving too fast.When asked about AGI risks, Lee reframed the question. "I'm less afraid of AI becoming self-aware and causing danger for humans in the short term," he said, "but more worried about it being used by bad people to do terrible things, or by the AI race pushing people to work so hard, so fast and furious and move fast and break things that they build products that have problems and holes to be exploited."He continued: "I'm very worried about that. In fact, I think some terrible event will happen that will be a wake up call from this sort of problem."Lee's perspective carries unusual weight because of his unique vantage point spanning both Chinese and American AI development. Over a career spanning more than three decades, he has held senior positions at Apple, Microsoft, and Google, while also founding Sinovation Ventures, which has invested in more than 400 companies across both countries. His AI company, 01.AI, founded in 2023, has released several open-source models that rank among the most capable in the world.For American companies and policymakers, Lee's analysis presents a complex strategic picture. The United States appears to have clear advantages in enterprise AI software, fundamental research, and computing infrastructure. But China is moving faster in consumer applications, manufacturing robotics at lower costs, and potentially pulling ahead in open-source model development.The bifurcation suggests that rather than a single "winner" in AI, the world may be heading toward a technology landscape where different countries excel in different domains — with all the economic and geopolitical complications that implies.As the TED AI conference continued Wednesday, Lee's assessment hung over subsequent discussions. His message seemed clear: the AI race is not one contest, but many — and the United States and China are each winning different races.Standing in the conference hall afterward, one venture capitalist, who asked not to be named, summed up the mood in the room: "We're not competing with China anymore. We're competing on parallel tracks." Whether those tracks eventually converge — or diverge into entirely separate technology ecosystems — may be the defining question of the next decade.

#ai & ml #commentary

This article is part of a series on the Sens-AI Framework—practical habits for learning and coding with AI. Read the original framework introduction and explore the complete methodology in Andrew Stellman’s O’Reilly report Critical Thinking Habits for Coding with AI. A few decades ago, I worked with a developer who was respected by everyone on our team. Much […]

A wireless eye implant developed at Stanford Medicine has restored reading ability to people with advanced macular degeneration. The PRIMA chip works with smart glasses to replace lost photoreceptors using infrared light. Most trial participants regained functional vision, reading books and recognizing signs. Researchers are now developing higher-resolution versions that could eventually provide near-normal sight.

#artificial intelligence #app

It’s late August in Rwanda’s capital, Kigali, and people are filling a large hall at one of Africa’s biggest gatherings of minds in AI and machine learning. The room is draped in white curtains, and a giant screen blinks with videos created with generative AI. A classic East African folk song by the Tanzanian singer…

Researchers at the University of Surrey developed an AI that predicts what a person’s knee X-ray will look like in a year, helping track osteoarthritis progression. The tool provides both a visual forecast and a risk score, offering doctors and patients a clearer understanding of the disease. Faster and more interpretable than earlier systems, it could soon expand to predict other conditions like lung or heart disease.

#ai

Presented by ArmA simpler software stack is the key to portable, scalable AI across cloud and edge. AI is now powering real-world applications, yet fragmented software stacks are holding it back. Developers routinely rebuild the same models for different hardware targets, losing time to glue code instead of shipping features. The good news is that a shift is underway. Unified toolchains and optimized libraries are making it possible to deploy models across platforms without compromising performance.Yet one critical hurdle remains: software complexity. Disparate tools, hardware-specific optimizations, and layered tech stacks continue to bottleneck progress. To unlock the next wave of AI innovation, the industry must pivot decisively away from siloed development and toward streamlined, end-to-end platforms.This transformation is already taking shape. Major cloud providers, edge platform vendors, and open-source communities are converging on unified toolchains that simplify development and accelerate deployment, from cloud to edge. In this article, we’ll explore why simplification is the key to scalable AI, what’s driving this momentum, and how next-gen platforms are turning that vision into real-world results.The bottleneck: fragmentation, complexity, and inefficiencyThe issue isn’t just hardware variety; it’s duplicated effort across frameworks and targets that slows time-to-value.Diverse hardware targets: GPUs, NPUs, CPU-only devices, mobile SoCs, and custom accelerators.Tooling and framework fragmentation: TensorFlow, PyTorch, ONNX, MediaPipe, and others.Edge constraints: Devices require real-time, energy-efficient performance with minimal overhead.According to Gartner Research, these mismatches create a key hurdle: over 60% of AI initiatives stall before production, driven by integration complexity and performance variability. What software simplification looks likeSimplification is coalescing around five moves that cut re-engineering cost and risk:Cross-platform abstraction layers that minimize re-engineering when porting models.Performance-tuned libraries integrated into major ML frameworks.Unified architectural designs that scale from datacenter to mobile.Open standards and runtimes (e.g., ONNX, MLIR) reducing lock-in and improving compatibility.Developer-first ecosystems emphasizing speed, reproducibility, and scalability.These shifts are making AI more accessible, especially for startups and academic teams that previously lacked the resources for bespoke optimization. Projects like Hugging Face’s Optimum and MLPerf benchmarks are also helping standardize and validate cross-hardware performance.Ecosystem momentum and real-world signals Simplification is no longer aspirational; it’s happening now. Across the industry, software considerations are influencing decisions at the IP and silicon design level, resulting in solutions that are production-ready from day one. Major ecosystem players are driving this shift by aligning hardware and software development efforts, delivering tighter integration across the stack.A key catalyst is the rapid rise of edge inference, where AI models are deployed directly on devices rather than in the cloud. This has intensified demand for streamlined software stacks that support end-to-end optimization, from silicon to system to application. Companies like Arm are responding by enabling tighter coupling between their compute platforms and software toolchains, helping developers accelerate time-to-deployment without sacrificing performance or portability. The emergence of multi-modal and general-purpose foundation models (e.g., LLaMA, Gemini, Claude) has also added urgency. These models require flexible runtimes that can scale across cloud and edge environments. AI agents, which interact, adapt, and perform tasks autonomously, further drive the need for high-efficiency, cross-platform software.MLPerf Inference v3.1 included over 13,500 performance results from 26 submitters, validating multi-platform benchmarking of AI workloads. Results spanned both data center and edge devices, demonstrating the diversity of optimized deployments now being tested and shared.Taken together, these signals make clear that the market’s demand and incentives are aligning around a common set of priorities, including maximizing performance-per-watt, ensuring portability, minimizing latency, and delivering security and consistency at scale.What must happen for successful simplificationTo realize the promise of simplified AI platforms, several things must occur:Strong hardware/software co-design: hardware features that are exposed in software frameworks (e.g., matrix multipliers, accelerator instructions), and conversely, software that is designed to take advantage of underlying hardware.Consistent, robust toolchains and libraries: developers need reliable, well-documented libraries that work across devices. Performance portability is only useful if the tools are stable and well supported.Open ecosystem: hardware vendors, software framework maintainers, and model developers need to cooperate. Standards and shared projects help avoid re-inventing the wheel for every new device or use case.Abstractions that don’t obscure performance: while high-level abstraction helps developers, they must still allow tuning or visibility where needed. The right balance between abstraction and control is key.Security, privacy, and trust built in: especially as more compute shifts to devices (edge/mobile), issues like data protection, safe execution, model integrity, and privacy matter.Arm as one example of ecosystem-led simplification Simplifying AI at scale now hinges on system-wide design, where silicon, software, and developer tools evolve in lockstep. This approach enables AI workloads to run efficiently across diverse environments, from cloud inference clusters to battery-constrained edge devices. It also reduces the overhead of bespoke optimization, making it easier to bring new products to market faster. Arm (Nasdaq:Arm) is advancing this model with a platform-centric focus that pushes hardware-software optimizations up through the software stack. At COMPUTEX 2025, Arm demonstrated how its latest Arm9 CPUs, combined with AI-specific ISA extensions and the Kleidi libraries, enable tighter integration with widely used frameworks like PyTorch, ExecuTorch, ONNX Runtime, and MediaPipe. This alignment reduces the need for custom kernels or hand-tuned operators, allowing developers to unlock hardware performance without abandoning familiar toolchains. The real-world implications are significant. In the data center, Arm-based platforms are delivering improved performance-per-watt, critical for scaling AI workloads sustainably. On consumer devices, these optimizations enable ultra-responsive user experiences and background intelligence that’s always on, yet power efficient.More broadly, the industry is coalescing around simplification as a design imperative, embedding AI support directly into hardware roadmaps, optimizing for software portability, and standardizing support for mainstream AI runtimes. Arm’s approach illustrates how deep integration across the compute stack can make scalable AI a practical reality.Market validation and momentumIn 2025, nearly half of the compute shipped to major hyperscalers will run on Arm-based architectures, a milestone that underscores a significant shift in cloud infrastructure. As AI workloads become more resource-intensive, cloud providers are prioritizing architectures that deliver superior performance-per-watt and support seamless software portability. This evolution marks a strategic pivot toward energy-efficient, scalable infrastructure optimized for the performance and demands of modern AI.At the edge, Arm-compatible inference engines are enabling real-time experiences, such as live translation and always-on voice assistants, on battery-powered devices. These advancements bring powerful AI capabilities directly to users, without sacrificing energy efficiency.Developer momentum is accelerating as well. In a recent collaboration, GitHub and Arm introduced native Arm Linux and Windows runners for GitHub Actions, streamlining CI workflows for Arm-based platforms. These tools lower the barrier to entry for developers and enable more efficient, cross-platform development at scale. What comes nextSimplification doesn’t mean removing complexity entirely; it means managing it in ways that empower innovation. As the AI stack stabilizes, winners will be those who deliver seamless performance across a fragmented landscape.From a future-facing perspective, expect:Benchmarks as guardrails: MLPerf + OSS suites guide where to optimize next.More upstream, fewer forks: Hardware features land in mainstream tools, not custom branches.Convergence of research + production: Faster handoff from papers to product via shared runtimes.ConclusionAI’s next phase isn’t about exotic hardware; it’s also about software that travels well. When the same model lands efficiently on cloud, client, and edge, teams ship faster and spend less time rebuilding the stack.Ecosystem-wide simplification, not brand-led slogans, will separate the winners. The practical playbook is clear: unify platforms, upstream optimizations, and measure with open benchmarks. Explore how Arm AI software platforms are enabling this future — efficiently, securely, and at scale.Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

ChatGPT Atlas arrives, the magic prompt, the ultimate coding tool, and more...

#amazon sagemaker canvas #intermediate (200) #technical how-to

In this post, we walk through how to take an ML model built in SageMaker Canvas and deploy it using SageMaker Serverless Inference, helping you go from model creation to production-ready predictions quickly and efficiently without managing any infrastructure. This solution demonstrates a complete workflow from adding your trained model to the SageMaker Model Registry through creating serverless endpoint configurations and deploying endpoints that automatically scale based on demand .

#ai

Chinese e-commerce giant Alibaba’s famously prolific Qwen Team of AI model researchers and engineers has introduced a major expansion to its Qwen Deep Research tool, which is available as an optional modality the user can activate on the web-based Qwen Chat (a competitor to ChatGPT).The update lets users generate not only comprehensive research reports with well-organized citations, but also interactive web pages and multi-speaker podcasts — all within 1-2 clicks.This functionality is part of a proprietary release, distinct from many of Qwen’s previous open-source model offerings. While the feature relies on the open-source models Qwen3-Coder, Qwen-Image, and Qwen3-TTS to power its core capabilities, the end-to-end experience — including research execution, web deployment, and audio generation — is hosted and operated by Qwen. This means users benefit from a managed, integrated workflow without needing to configure infrastructure. That said, developers with access to the open-source models could theoretically replicate similar functionality on private or commercial systems.The update was announced via the team’s official X account (@Alibaba_Qwen) today, October 21, 2025, stating:“Qwen Deep Research just got a major upgrade. It now creates not only the report, but also a live webpage and a podcast — powered by Qwen3-Coder, Qwen-Image, and Qwen3-TTS. Your insights, now visual and audible.”Multi-Format Research OutputThe core workflow begins with a user request inside the Qwen Chat interface. From there, Qwen collaborates by asking clarifying questions to shape the research scope, pulls data from the web and official sources, and analyzes or resolves any inconsistencies it finds — even generating custom code when needed.A demo video posted by Qwen on X walks through this process on Qwen Chat using the U.S. SaaS market as an example. In it, Qwen retrieves data from multiple industry sources, identifies discrepancies in market size estimates (e.g., $206 billion vs. $253 billion), and highlights ambiguities in the U.S. share of global figures. The assistant comments on differences in scope between sources and calculates a compound annual growth rate (CAGR) of 19.8% from 2020 to 2023, providing contextual analysis to back up the raw numbers.Once the research is complete, users can click on the "eyeball" icon below the output result (see screenshot), which will bring up a PDF-style report in the right hand pane.Then, when viewing the report in the right-hand pane, the user can click the "Create" button in the upper-right hand corner and select from the following two options:"Web Dev" which produces a live, professional-grade web page, automatically deployed and hosted by Qwen, using Qwen3-Coder for structure and Qwen-Image for visuals."Podcast," which, as it states, produces an audio podcast, featuring dynamic, multi-speaker narration generated by Qwen3-TTS, also hosted by Qwen for easy sharing and playback.This enables users to quickly convert a single research project into multiple forms of content — written, visual, and audible — with minimal extra input.The website includes inline graphics generated by Qwen Image, making it suitable for use in public presentations, classrooms, or publishing. The podcast feature allows users to select between 17 different speaker names as the host and 7 as the co-host, though I wasn't able to find a way to preview the voice outputs before selecting them. It appears designed for deep listening on the go. There was no way to change the language output that I could see, so mine came out in English, like my reports and initial prompts, though the Qwen LLMs are multi-modal. The voices were slightly more robotic than other AI tools I've used.Here's an example of a web page I generated on commonalities in authoritarian regimes throughout history, another one on UFO or UAP sightings, and below this paragraph, a podcast on UFO or UAP sightings. While the website is hosted via a public link, the podcast must be downloaded by the user and can't be linked to publicly, from what I could tell in my brief usage so far.Note the podcast is much different than the actual report — not just a straight read-through audio version of it, rather, a new format of two hosts discussing and bantering about the subject using the report as the jumping off point. The web page versions of the report also include new graphics not found in the PDF report.Comparisons to Google's NotebookLMWhile the new capabilities have been well received by many early users, comparisons to other research assistants have surfaced — particularly Google’s NotebookLM, which recently exited beta.AI commentator and newsletter writer Chubby (@kimmonismus) noted on X:“I am really grateful that Qwen provides regular updates. That’s great.But the attempt to build a NotebookLM clone inside Qwen-3-max doesn’t sound very promising compared to Google’s version.”While NotebookLM is built around organizing and querying existing documents and web pages, Qwen Deep Research focuses more on generating new research content from scratch, aggregating sources from the open web, and presenting it across multiple modalities. The comparison suggests that while the two tools overlap in general concept — AI-assisted research — they diverge in approach and target user experience.AvailabilityQwen Deep Research is now live and available through the Qwen Chat app. The feature can be accessed with the following URL.No pricing details have been provided for Qwen3-Max or the specific Deep Research capabilities as of this writing.What's Next For Qwen Deep Research?By combining research guidance, data analysis, and multi-format content creation into a single tool, Qwen Deep Research aims to streamline the path from idea to publishable output. The integration of code, visuals, and voice makes it especially attractive to content creators, educators, and independent analysts who want to scale their research into web- or podcast-friendly forms without switching platforms.Still, comparisons to more specialized offerings like NotebookLM raise questions about how Qwen’s generalized approach stacks up on depth, precision, and refinement. Whether the strength of its multi-format execution outweighs those concerns may come down to user priorities — and whether they value single-click publishing over tight integration with existing notes and materials.For now, Qwen is signaling that research doesn’t end with a document — it begins with one.Let me know if you want this repackaged into something shorter or tailored to a particular audience — newsletter, press-style blog, internal team explainer, etc.

#ai

DeepSeek, the Chinese artificial intelligence research company that has repeatedly challenged assumptions about AI development costs, has released a new model that fundamentally reimagines how large language models process information—and the implications extend far beyond its modest branding as an optical character recognition tool.The company's DeepSeek-OCR model, released Monday with full open-source code and weights, achieves what researchers describe as a paradigm inversion: compressing text through visual representation up to 10 times more efficiently than traditional text tokens. The finding challenges a core assumption in AI development and could pave the way for language models with dramatically expanded context windows, potentially reaching tens of millions of tokens."We present DeepSeek-OCR as an initial investigation into the feasibility of compressing long contexts via optical 2D mapping," the research team wrote in their technical paper. "Experiments show that when the number of text tokens is within 10 times that of vision tokens (i.e., a compression ratio < 10×), the model can achieve decoding (OCR) precision of 97%."The implications have resonated across the AI research community. Andrej Karpathy, co-founder of OpenAI and former director of AI at Tesla, said in a post that the work raises fundamental questions about how AI systems should process information. "Maybe it makes more sense that all inputs to LLMs should only ever be images," Karpathy wrote. "Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in."How DeepSeek achieved 10x compression by treating text as imagesWhile DeepSeek marketed the release as an OCR model — a technology for converting images of text into digital characters — the research paper reveals more ambitious goals. The model demonstrates that visual representations can serve as a superior compression medium for textual information, inverting the conventional hierarchy where text tokens were considered more efficient than vision tokens."Traditionally, vision LLM tokens almost seemed like an afterthought or 'bolt on' to the LLM paradigm," wrote Jeffrey Emanuel, an AI researcher, in a detailed analysis of the paper. "And 10k words of English would take up far more space in a multimodal LLM when expressed as intelligible pixels than when expressed as tokens...But that gets inverted now from the ideas in this paper."The model's architecture consists of two primary components: DeepEncoder, a novel 380-million-parameter vision encoder, and a 3-billion-parameter mixture-of-experts language decoder with 570 million activated parameters. DeepEncoder combines Meta's Segment Anything Model (SAM) for local visual perception with OpenAI's CLIP model for global visual understanding, connected through a 16x compression module.To validate their compression claims, DeepSeek researchers tested the model on the Fox benchmark, a dataset of diverse document layouts. The results were striking: using just 100 vision tokens, the model achieved 97.3% accuracy on documents containing 700-800 text tokens — representing an effective compression ratio of 7.5x. Even at compression ratios approaching 20x, accuracy remained around 60%.The practical impact: Processing 200,000 pages per day on a single GPUThe efficiency gains translate directly to production capabilities. According to the company, a single Nvidia A100-40G GPU can process more than 200,000 pages per day using DeepSeek-OCR. Scaling to a cluster of 20 servers with eight GPUs each, throughput reaches 33 million pages daily — sufficient to rapidly construct training datasets for other AI models.On OmniDocBench, a comprehensive document parsing benchmark, DeepSeek-OCR outperformed GOT-OCR2.0 (which uses 256 tokens per page) while using only 100 vision tokens. More dramatically, it surpassed MinerU2.0 — which requires more than 6,000 tokens per page on average — while using fewer than 800 vision tokens.DeepSeek designed the model to support five distinct resolution modes, each optimized for different compression ratios and use cases. The "Tiny" mode operates at 512×512 resolution with just 64 vision tokens, while "Gundam" mode combines multiple resolutions dynamically for complex documents. "Gundam mode consists of n×640×640 tiles (local views) and a 1024×1024 global view," the researchers wrote.Why this breakthrough could unlock 10 million token context windowsThe compression breakthrough has immediate implications for one of the most pressing challenges in AI development: expanding the context windows that determine how much information language models can actively consider. Current state-of-the-art models typically handle context windows measured in hundreds of thousands of tokens. DeepSeek's approach suggests a path to windows ten times larger."The potential of getting a frontier LLM with a 10 or 20 million token context window is pretty exciting," Emanuel wrote. "You could basically cram all of a company's key internal documents into a prompt preamble and cache this with OpenAI and then just add your specific query or prompt on top of that and not have to deal with search tools and still have it be fast and cost-effective."The researchers explicitly frame their work in terms of context compression for language models. "Through DeepSeek-OCR, we demonstrate that vision-text compression can achieve significant token reduction (7-20×) for different historical context stages, offering a promising direction for addressing long-context challenges in large language models," they wrote.The paper includes a speculative but intriguing diagram illustrating how the approach could implement memory decay mechanisms similar to human cognition. Older conversation rounds could be progressively downsampled to lower resolutions, consuming fewer tokens while maintaining key information — a form of computational forgetting that mirrors biological memory.How visual processing could eliminate the 'ugly' tokenizer problemBeyond compression, Karpathy highlighted how the approach challenges fundamental assumptions about how language models should process text. Traditional tokenizers—the systems that break text into units for processing—have long been criticized for their complexity and limitations."I already ranted about how much I dislike the tokenizer," Karpathy wrote. "Tokenizers are ugly, separate, not end-to-end stage. It 'imports' all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network."Visual processing of text could eliminate these issues while enabling new capabilities. The approach naturally handles formatting information lost in pure text representations: bold text, colors, layout, embedded images. "Input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful," Karpathy noted.The implications resonate with human cognitive science. Emanuel drew a parallel to Hans Bethe, the renowned physicist who memorized vast amounts of reference data: "Having vast amounts of task-specific knowledge in your working memory is extremely useful. This seems like a very clever and additive approach to potentially expanding that memory bank by 10x or more."The model's training: 30 million PDF pages across 100 languagesThe model's capabilities rest on an extensive training regimen using diverse data sources. DeepSeek collected 30 million PDF pages covering approximately 100 languages, with Chinese and English accounting for 25 million pages. The training data spans nine document types — academic papers, financial reports, textbooks, newspapers, handwritten notes, and others.Beyond document OCR, the training incorporated what the researchers call "OCR 2.0" data: 10 million synthetic charts, 5 million chemical formulas, and 1 million geometric figures. The model also received 20% general vision data for tasks like image captioning and object detection, plus 10% text-only data to maintain language capabilities.The training process employed pipeline parallelism across 160 Nvidia A100-40G GPUs (20 nodes with 8 GPUs each), with the vision encoder divided between two pipeline stages and the language model split across two others. "For multimodal data, the training speed is 70B tokens/day," the researchers reported.Open source release accelerates research and raises competitive questionsTrue to DeepSeek's pattern of open development, the company released the complete model weights, training code, and inference scripts on GitHub and Hugging Face. The GitHub repository gained over 4,000 stars within 24 hours of release, according to Dataconomy.The breakthrough raises questions about whether other AI labs have developed similar techniques but kept them proprietary. Emanuel speculated that Google's Gemini models, which feature large context windows and strong OCR performance, might employ comparable approaches. "For all we know, Google could have already figured out something like this, which could explain why Gemini has such a huge context size and is so good and fast at OCR tasks," Emanuel wrote.Google's Gemini 2.5 Pro offers a 1-million-token context window, with plans to expand to 2 million, though the company has not publicly detailed the technical approaches enabling this capability. OpenAI's GPT-5 supports 400,000 tokens, while Anthropic's Claude 4.5 offers 200,000 tokens, with a 1-million-token window available in beta for eligible organizations.The unanswered question: Can AI reason over compressed visual tokens?While the compression results are impressive, researchers acknowledge important open questions. "It's not clear how exactly this interacts with the other downstream cognitive functioning of an LLM," Emanuel noted. "Can the model reason as intelligently over those compressed visual tokens as it can using regular text tokens? Does it make the model less articulate by forcing it into a more vision-oriented modality?"The DeepSeek paper focuses primarily on the compression-decompression capability, measured through OCR accuracy, rather than downstream reasoning performance. This leaves open whether language models could reason effectively over large contexts represented primarily as compressed visual tokens.The researchers acknowledge their work represents "an initial exploration into the boundaries of vision-text compression." They note that "OCR alone is insufficient to fully validate true context optical compression" and plan future work including "digital-optical text interleaved pretraining, needle-in-a-haystack testing, and other evaluations."DeepSeek has established a pattern of achieving competitive results with dramatically lower computational resources than Western AI labs. The company's earlier DeepSeek-V3 model reportedly cost just $5.6 million to train—though this figure represents only the final training run and excludes R&D and infrastructure costs—compared to hundreds of millions for comparable models from OpenAI and Anthropic.Industry analysts have questioned the $5.6 million figure, with some estimates placing the company's total infrastructure and operational costs closer to $1.3 billion, though still lower than American competitors' spending.The bigger picture: Should language models process text as images?DeepSeek-OCR poses a fundamental question for AI development: should language models process text as text, or as images of text? The research demonstrates that, at least for compression purposes, visual representation offers significant advantages. Whether this translates to effective reasoning over vast contexts remains to be determined."From another perspective, optical contexts compression still offers substantial room for research and improvement, representing a promising new direction," the researchers concluded in their paper.For the AI industry, the work adds another dimension to the race for longer context windows — a competition that has intensified as language models are applied to increasingly complex tasks requiring vast amounts of information. The open-source release ensures the technique will be widely explored, tested, and potentially integrated into future AI systems.As Karpathy framed the deeper implication: "OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa." In other words, the path forward for AI might not run through better tokenizers — it might bypass text tokens altogether.

#ai

Google AI Studio has gotten a big vibe coding upgrade with a new interface, buttons, suggestions and community features that allow anyone with an idea for an app — even complete novices, laypeople, or non-developers like yours truly — to bring it into existence and deploy it live, on the web, for anyone to use, within minutes.The updated Build tab is available now at ai.studio/build, and it’s free to start. Users can experiment with building applications without needing to enter payment information upfront, though certain advanced features like Veo 3.1 and Cloud Run deployment require a paid API key.The new features appear to me to make Google's AI models and offerings even more competitive, perhaps preferred, for many general users to dedicated AI startup rivals like Anthropic's Claude Code and OpenAI's Codex, respectively, two "vibe coding" focused products that are beloved by developers — but seem to have a higher barrier to entry or may require more technical know-how.A Fresh Start: Redesigned Build ModeThe updated Build tab serves as the entry point to vibe coding. It introduces a new layout and workflow where users can select from Google’s suite of AI models and features to power their applications. The default is Gemini 2.5 Pro, which is great for most cases.Once selections are made, users simply describe what they want to build, and the system automatically assembles the necessary components using Gemini’s APIs.This mode supports mixing capabilities like Nano Banana (a lightweight AI model), Veo (for video understanding), Imagine (for image generation), Flashlight (for performance-optimized inference), and Google Search.Patrick Löber, Developer Relations at Google DeepMind, highlighted that the experience is meant to help users “supercharge your apps with AI” using a simple prompt-to-app pipeline.In a video demo he posted on X and LinedIn, he showed how just a few clicks led to the automatic generation of a garden planning assistant app, complete with layouts, visuals, and a conversational interface.From Prompt to Production: Building and Editing in Real TimeOnce an app is generated, users land in a fully interactive editor. On the left, there’s a traditional code-assist interface where developers can chat with the AI model for help or suggestions. On the right, a code editor displays the full source of the app.Each component—such as React entry points, API calls, or styling files—can be edited directly. Tooltips help users understand what each file does, which is especially useful for those less familiar with TypeScript or frontend frameworks.Apps can be saved to GitHub, downloaded locally, or shared directly. Deployment is possible within the Studio environment or via Cloud Run if advanced scaling or hosting is needed.Inspiration on Demand: The ‘I’m Feeling Lucky’ ButtonOne standout feature in this update is the “I’m Feeling Lucky” button. Designed for users who need a creative jumpstart, it generates randomized app concepts and configures the app setup accordingly. Each press yields a different idea, complete with suggested AI features and components.Examples produced during demos include:An interactive map-based chatbot powered by Google Search and conversational AI.A dream garden designer using image generation and advanced planning tools.A trivia game app with an AI host whose personality users can define, integrating both Imagine and Flashlight with Gemini 2.5 Pro for conversation and reasoning.Logan Kilpatrick, Lead of Product for Google AI Studio and Gemini AI, noted in a demo video of his own that this feature encourages discovery and experimentation. “You get some really, really cool, different experiences,” he said, emphasizing its role in helping users find novel ideas quickly.Hands-On Test: From Prompt to App in 65 SecondsTo test the new workflow, I prompted Gemini with:A randomized dice rolling web application where the user can select between common dice sizes (6 sides, 10 sides, etc) and then see an animated die rolling and choose the color of their die as well.Within 65 seconds (just over a minute) AI Studio returned a fully working web app featuring:Dice size selector (d4, d6, d8, d10, d12, d20)Color customization options for the dieAnimated rolling effect with randomized resultsClean, modern UI built with React, TypeScript, and Tailwind CSSThe platform also generated a complete set of structured files, including App.tsx, constants.ts, and separate components for dice logic and controls. After generation, it was easy to iterate: adding sound effects for each interaction (rolling, choosing a die, changing color) required only a single follow-up prompt to the built-in assistant. This was also suggested by Gemini, too, by the way. From there, the app can be previewed live or exported using built-in controls to:Save to GitHubDownload the full codebaseCopy the project for remixingDeploy via integrated toolsMy brief, hands-on test showed just how quickly even small utility apps can go from idea to interactive prototype—without leaving the browser or writing boilerplate code manually.AI-Suggested Enhancements and Feature RefinementIn addition to code generation, Google AI Studio now offers context-aware feature suggestions. These recommendations, generated by Gemini’s Flashlight capability, analyze the current app and propose relevant improvements.In one example, the system suggested implementing a feature that displays the history of previously generated images in an image studio tab. These iterative enhancements allow builders to expand app functionality over time without starting from scratch.Kilpatrick emphasized that users can continue to refine their projects as they go, combining both automatic generation and manual adjustments. “You can go in and continue to edit and sort of refine the experience that you want iteratively,” he said.Free to Start, Flexible to GrowThe new experience is available at no cost for users who want to experiment, prototype, or build lightweight apps. There’s no requirement to enter credit card information to begin using vibe coding.However, more powerful capabilities — such as using models like Veo 3.1 or deploying through Cloud Run — do require switching to a paid API key.This pricing structure is intended to lower the barrier to entry for experimentation while providing a clear path to scale when needed.Built for All Skill LevelsOne of the central goals of the vibe coding launch is to make AI app development accessible to more people. The system supports both high-level visual builders and low-level code editing, creating a workflow that works for developers across experience levels.Kilpatrick mentioned that while he’s more familiar with Python than TypeScript, he still found the editor useful because of the helpful file descriptions and intuitive layout. This focus on usability could make AI Studio a compelling option for developers exploring AI for the first time.More to Come: A Week of LaunchesThe launch of vibe coding is the first in a series of announcements expected throughout the week. While specific future features haven’t been revealed yet, both Kilpatrick and Löber hinted that additional updates are on the way.With this update, Google AI Studio positions itself as a flexible, user-friendly environment for building AI-powered applications—whether for fun, prototyping, or production deployment. The focus is clear: make the power of Gemini’s APIs accessible without unnecessary complexity.

#amazon bedrock #amazon machine learning #amazon nova #artificial intelligence #generative ai #technical how-to #amazon bedrock agentcore

In this post, we explore how Amazon Nova Sonic's speech-to-speech capabilities can be combined with Amazon Bedrock AgentCore to create sophisticated multi-agent voice assistants that break complex tasks into specialized, manageable components. The approach demonstrates how to build modular, scalable voice applications using a banking assistant example with dedicated sub-agents for authentication, banking inquiries, and mortgage services, offering a more maintainable alternative to monolithic voice assistant designs.

#amazon elastic kubernetes service #amazon sagemaker hyperpod #technical how-to

In this post, we demonstrate how to deploy and manage machine learning training workloads using the Amazon SageMaker HyperPod training operator, which enhances training resilience for Kubernetes workloads through pinpoint recovery and customizable monitoring capabilities. The Amazon SageMaker HyperPod training operator helps accelerate generative AI model development by efficiently managing distributed training across large GPU clusters, offering benefits like centralized training process monitoring, granular process recovery, and hanging job detection that can reduce recovery times from tens of minutes to seconds.

#research #mit-ibm watson ai lab #collaboration #industry #computer science and technology #artificial intelligence #machine learning #algorithms #data #business and management #technology and society #electrical engineering and computer science (eecs) #computer science and artificial intelligence laboratory (csail) #idss #mit schwarzman college of computing #school of engineering

How the MIT-IBM Watson AI Lab is shaping AI-sociotechnical systems for the future.

Building AI agents that work in production requires more than powerful models.

#learning & education #google cloud #ai

Google Skills is a new home for building skills in AI, and learning about other topics like data analytics and security.

#ai & ml #commentary

This is the second of a three-part series by Markus Eisele. Part 1 can be found here. Stay tuned for part 3. Many AI projects fail. The reason is often simple. Teams try to rebuild last decade’s applications but add AI on top: A CRM system with AI. A chatbot with AI. A search engine […]

#artificial intelligence #app #the algorithm

Chatbots today are everything machines. If it can be put into words—relationship advice, work documents, code—AI will produce it, however imperfectly. But the one thing that almost no chatbot will ever do is stop talking to you.  That might seem reasonable. Why should a tech company build a feature that reduces the time people spend…

#ai

Researchers at Mila have proposed a new technique that makes large language models (LLMs) vastly more efficient when performing complex reasoning. Called Markovian Thinking, the approach allows LLMs to engage in lengthy reasoning without incurring the prohibitive computational costs that currently limit such tasks.The team’s implementation, an environment named Delethink, structures the reasoning chain into fixed-size chunks, breaking the scaling problem that plagues very long LLM responses. Initial estimates show that for a 1.5B parameter model, this method can cut the costs of training by more than two-thirds compared to standard approaches.The quadratic curse of long-chain reasoningFor an LLM to solve a complex problem, it often needs to generate a long series of intermediate “thinking” tokens, often referred to as chain-of-thought (CoT). In recent years, researchers have found that using reinforcement learning (RL) to train models to produce longer CoTs (sometimes referred to as LongCoT) has significantly improved their reasoning capabilities.However, the standard method for this has a critical flaw: The AI's "state" (the prompt plus all the reasoning tokens it has generated thus far in its processing) grows with every new reasoning token. For modern transformer-based models, this means the computational cost explodes quadratically as the reasoning chain gets longer, making it prohibitively expensive to train models for very complex tasks.Most current attempts to manage this cost focus on limiting how much thinking the model does, implicitly preferring shorter solutions or terminating the process early. While these methods offer some relief, the Mila researchers still operate within the LongCoT framework and are thus fundamentally bound by its quadratic nature.Instead of trying to control the computational growth, Mila created an RL environment that avoids the quadratic problem altogether. As co-author Amirhossein Kazemnejad explained, the goal is to enable capabilities like multi-week reasoning and scientific discovery. "That regime (and the RL needed to enable such capabilities) is not supported by the current LongCoT paradigm, because of quadratic compute cost," he said.Thinking in chunks with DelethinkThe researchers' solution is a paradigm they call the "Markovian Thinker," where the model reasons while keeping the size of its reasoning context window constant. The core idea is to change the RL setup to separate "how long the model thinks" from "how much context it must process." If done correctly, a Markovian Thinker turns the quadratic growth problem into linear compute and fixed memory requirements for LLM reasoning.The researchers put this paradigm into practice through Delethink, which forces the model to reason in a sequence of fixed-size chunks, such as 8,000 tokens at a time. Within each chunk, the model reasons as it normally would, using the classic attention mechanism. But when it reaches the limit of the chunk, the environment resets the context, creating a new prompt that includes the original query plus a short "carryover" from the previous chunk. For example, the carryover could be the last few tokens of the previous chunk of CoT or a summary of the most important results.This rearrangement of the problem forces the model to learn how to embed a summary of its progress, or a "textual Markovian state," into this carryover to continue its reasoning in the next chunk. This addresses the common concern of whether the model can remember important details from earlier steps. According to Kazemnejad, the model learns what to remember. "With training... the model is forced to learn to carry forward the task-critical state," he explained. He added crucial clarification for practical use: The original input prompt is not modified, including the documents or contextual data added to it. “Our approach is aimed at the reasoning phase and does not modify the prompt," he said.Delethink in actionTo test their approach, the researchers trained R1-Distill-1.5B with Delethink on a dataset of competition-level math problems, then evaluated it against several benchmarks. The model was trained to reason for up to 24,000 tokens but with fixed 8,000-token chunks. The researchers compared this to models trained with the standard LongCoT-RL method. Their findings indicate that the model trained with Delethink could reason up to 24,000 tokens, and matched or surpassed a LongCoT model trained with the same 24,000-token budget on math benchmarks. On other tasks like coding and PhD-level questions, Delethink also matched or slightly beat its LongCoT counterpart. “Overall, these results indicate that Delethink uses its thinking tokens as effectively as LongCoT-RL with reduced compute,” the researchers write.The benefits become even more pronounced when scaling beyond the training budget. While models trained with LongCoT quickly plateaued at their training limits, the Delethink-trained model continued to improve its performance. For instance, some math problems were only solved after the model reasoned for up to 140,000 tokens, far beyond its 24,000-token training budget. This linear compute advantage is substantial for enterprise applications. The researchers estimate that training a model to an average thinking length of 96,000 tokens would require 27 H100-GPU-months with LongCoT, versus just 7 with Delethink.This efficiency extends directly to inference, the primary operational cost for most enterprises. "Models trained in Markovian Thinking use the same inference style (delethink-tracing) during test time, which provides the same advantages of linear compute and constant memory after training," said Kazemnejad. He offered a practical example: An AI agent could "debug a large codebase and think for a long time... which of course reduces the cost significantly compared to the conventional LongCoT approach."Interestingly, the researchers found that off-the-shelf reasoning models, even without any specific training, already exhibit some ability to think in a Markovian way. This finding has immediate practical implications for developers. "In practice, this means that — without Delethink-RL— these models can already run a delethink-tracing wrapper and perform competitively with LongCoT on our benchmarked tasks," Kazemnejad said.Their experiments with larger models such as GPT-OSS 120B showed robust performance with Delethink across a range of complex tasks. This latent ability provides a strong starting point for RL training, helping explain why the method is so effective. “Together, these results suggest that Delethink is compatible and scales with state-of-the-art models,” the researchers conclude.The success of Markovian Thinking shows it may be possible for "next-generation reasoning models to think for millions of tokens," the researchers note. This opens the door to fundamentally new AI capabilities, moving beyond current constraints. "Markovian Thinking... opens the path for models that can 'think' for very long horizons, which we view as a necessary step toward eventual scientific discovery," Kazemnejad said. "Our approach removes a key bottleneck and can allow training for much longer horizon tasks, which enables next-gen capabilities."

#ai

OpenAI is entering the browser world with the launch of ChatGPT Atlas, an AI-enabled browser. Atlas, now available globally, can be accessed through Apple’s macOS, with support for Windows, iOS and Android coming soon. The announcement comes several months after rumors in July that OpenAI would release a web browser that would challenge the dominance of Google’s Chrome. In a livestream, CEO Sam Altman said he hopes Atlas will help bring about a new way of interacting with and using the web, one where people chat with the browser rather than typing a URL. “We think AI represents a rare once-in-a-decade opportunity to rethink what a browser can be about and how to use one, and how to most productively and pleasantly use the web,” Altman said. “Tabs were great, but we haven’t seen a lot of innovation since then, so we got very excited to really rethink what this could be.” Atlas is meant to offer users a more seamless way to browse the web and ask chat agents questions. It invites users to either search for information via a prompt or question, or just type a URL. Part of Atlas’s value proposition is the ability to call on agents to do tasks directly in the browser. However, agents will only be available to ChatGPT Business, Plus and Pro users for now. Users can download Atlas from its dedicated site, but must log in to their ChatGPT account to begin using it.   Chatting with a browser about your memoriesAtlas differentiates itself from browsers like Chrome or Apple’s Safari with its chat feature. The home page essentially is ChatGPT, with a prompt box and several suggested questions. During the livestream, OpenAI said that the more people use Atlas, the more personalized the suggestions will be. The chat box “follows” the user, meaning people can chat with ChatGPT on any website. The model will read what’s on the browser and answer any questions users might have. When you first open Atlas, it prompts you to import data from other browsers you may be using. When I set up mine, it only asked me for Chrome or Safari, the two browsers I mainly use. Importing browser data creates a memory base for Atlas that ChatGPT will reference. So far, Atlas’s memory is hit or miss. I connected my Chrome history, and when I asked about a recent travel destination search I did (and have been searching for every day for a month), Atlas claimed I had never searched for that information.The in-browser chat also reduces the copy-pasting that users often resort to when, say, writing an email. People can open their Gmail, then ask ChatGPT in the browser to help tidy up the message. Of course, Gmail or any other Google Workspace product already offers Gemini-powered capabilities, such as email rewriting. OpenAI CEO of Applications, Fidji Simo, said in a blog post that users can toggle browser memory on or off and control what it can see.Agent mode on the browserIn the past few months, OpenAI has shored up its agent infrastructure in the expectation that individuals and enterprises will rely more and more on agents. Agents on Atlas can use the browser if needed to accomplish a task. For example, you could be looking at a recipe and ask chat to build a grocery list. The agent can then begin shopping on your preferred grocery site. OpenAI has already added a buy button to ChatGPT and proposed an agentic commerce protocol, which could be helpful for Atlas. However, during the demo, OpenAI staff opted not to let the agent proceed to purchase products. Having the agent directly in the browser moves a step beyond point A, where the browser uses an agent in Chrome. Ideally, it already knows what you were looking at and has the information it needs to access and execute on the browser.A new browser warWith more people using AI models and chat platforms for web searches, launching an AI-enabled browser has become another battleground for model providers. Of course, as Chrome has become more popular, it has slowly added AI capabilities thanks to Google's Gemini models. Google has also been experimenting with other AI-powered search capabilities, such as generative image search. But, companies like Perplexity, with its Comet browser, is hoping to take on Chrome. Opera, long a Chrome competitor, also repositioned itself as an AI-powered browser by embedding AI features into its platform. For some, Atlas represents a fresh new way to use a web browser. However, many pointed out that Atlas does not exactly reinvent the wheel, as it shares some features with Comet. What is interesting about Atlas is how familiar it is. It looks just like ChatGPT, but it also has tabs like Chrome. OpenAI emphasized that this is the first version of Atlas, implying that this may not be its final form. What is for sure is that Atlas is OpenAI’s first volley in the AI browser wars. 

#ai

Presented by Apptio, an IBM companyWhen a technology with revolutionary potential like AI emerges, it’s easy for companies to let enthusiasm outrun fiscal discipline. In the race to transform operations and outpace competitors, cost control can feel like a distraction. But with AI, costs can escalate quickly — so financial discipline remains essential.Long-term success depends on one thing: understanding the link between AI’s value and its true cost, so its promise translates into measurable business impact.The hidden financial risks of AIWhile AI is helping to transform business operations, its own financial footprint often remains obscure. If you can’t connect costs to impact, how can you be sure your AI investments will drive meaningful ROI? Gaining visibility into AI’s financial blind spot is especially urgent given the breakneck speed of AI investment. When it’s easy for DevOps teams and business units to procure their own resources on an OpEx basis, costs and inefficiencies can quickly spiral. The decentralized nature of spend across cloud infrastructure, data platforms, engineering resources, and query tokens makes it difficult to attribute costs to business outcomes. And because budgets are finite, every dollar spent represents an unconscious tradeoff with other strategic priorities. Without transparency into AI costs, companies risk overspending, under-delivering, and missing out on better opportunities to drive value. Why traditional financial planning falls short for AIAs we learned with cloud, we see that traditional static budget models are poorly suited for dynamic workloads and rapidly scaling resources. The key to cloud cost management has been tagging and telemetry, which help companies attribute each dollar of cloud spend to specific business outcomes. AI cost management will require the same discipline, but on a broader scale. On top of costs for storage, compute, and data transfer, each AI project brings its own requirements. These range from prompt optimization and model routing to data preparation, regulatory compliance, governance, security, and personnel.This complexity leaves finance and IT teams struggling to reconcile AI-related spend with business outcomes — but without these connections, it’s impossible to measure ROI.The strategic value of cost transparencyCost transparency empowers smarter decisions — from resource allocation to talent deployment. Connecting specific AI resources with the projects that they support helps technology decision-makers ensure that the most high-value projects are given what they need to succeed. Setting the right priorities is especially critical when top talent is in short supply. If your highly compensated engineers and data scientists are spread across too many interesting but unessential pilots, it’ll be hard to staff the next strategic — and perhaps pressing — pivot.FinOps best practices apply equally to AI. Businesses can use cost insights to optimize infrastructure and address waste — such as ensuring teams aren’t provisioning higher performance or lower latency than a given workload really needs or paying for a huge LLM when a smaller model would suffice. As work proceeds, tracking can flag rising costs so leaders can pivot quickly in more-promising directions. A project that makes sense at X cost might not be worthwhile at 2X cost. Companies that adopt a structured, transparent, and well-governed approach to AI costs are more likely to spend the right money in the right ways and see optimal ROI from their investment. TBM: An enterprise framework for AI cost managementTechnology Business Management (TBM) provides the foundation for AI cost transparency. It brings together three practices — IT Financial Management (ITFM), FinOps, and Strategic Portfolio Management (SPM) — to align technology investments with business outcomes.IT financial management (ITFM): ITFM focuses on managing IT finances in alignment with business priorities. ITFM teams analyze comprehensive data on IT costs and investments to track spending against budgets and forecasts, trim excess spending, and ensure financial transparency.The insights that ITFM teams gain can help businesses form more-strategic partnerships between IT and the business. Collaboration with IT leaders can help business leaders understand how to best meet their technology needs, adjust expenses and behaviors for budget, and keep a data-driven eye on business impact. FinOps: The goal of FinOps is to help optimize cloud costs and ROI through financial accountability and operational efficiency. FinOps teams work with management, financial, and engineering stakeholders to understand the interplay between the applications being built, the cloud resources that power them, their cost, and the value they generate.While FinOps has traditionally operated as a reactive function — identifying waste and optimization opportunities in the production environment — the practice is becoming more proactive. Providing engineers with cost insights and guardrails before deployment helps them make the best decisions about cloud resources from the start, rather than navigating a growing list of issues post-launch. Strategic portfolio management (SPM): SPM helps leaders ensure that investments in people and technology — like AI initiatives — are aligned with the company’s changing strategic needs. Holistic visibility and insights into organization-wide portfolios, programs, and processes show leaders which initiatives deliver value, where and how to apply course corrections, and when to reallocate budget and resources.SPM encompasses the entire project lifecycle, including strategic planning and alignment, scenario modeling, capacity and resource management, and financial analysis. Its overarching goal is to move more quickly from insights to action, helping organizations respond with agility to changing conditions or opportunities.By uniting the three practice areas into a structured framework, TBM enables technology, business, and finance leaders to connect technology investments to business outcomes for better financial transparency and decision-making. Most companies are already on the road to TBM, whether they realize it or not. They may have adopted some form of FinOps or cloud cost management. Or they might be developing strong financial expertise for IT. Or they may rely on Enterprise Agile Planning or SPM project management to deliver initiatives more successfully. AI draws on — and impacts — all of these areas. By unifying them under one umbrella with a common model and vocabulary, TBM brings essential clarity to the cost of AI investments and the business impact they enable.AI success depends on value — not just velocity. The cost transparency that TBM provides offers a road map that helps business and IT leaders make the right investments, deliver them cost-effectively, scale them responsibly, and turn AI from a risky bet into a measurable business asset and strategic driver. Whether you begin with ITFM, FinOps, or SPM, each practice can be a path toward TBM — and together they create a clear roadmap to AI value.Ajay Patel is General Manager, Apptio and IT Automation at IBM.Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

#ai

Presented by HPCreativity is quickly becoming the new measure of productivity. While AI is often framed as a tool for efficiency and automation, new research from MIT Sloan School of Management shows that generative AI enhances human creativity — when employees have the right tools and skills to use it effectively. That’s where AI PCs come in. These next-generation laptops combine local AI processing with powerful Neural Processing Units (NPUs), delivering the speed and security that knowledge workers expect while also unlocking new creative possibilities. By handling AI tasks directly on the device, AI PCs minimize latency, protect sensitive data, and lower energy consumption.Teams are already proving the impact. Marketing teams are using AI PCs to generate campaign assets in hours instead of weeks. Engineers are shortening design and prototyping cycles. Sales reps are creating personalized proposals onsite, even without cloud access. In each case, AI PCs are not just accelerating workflows — they’re sparking fresh ideas, faster iteration, and more engaged teams.The payoff is clear: creativity that translates into measurable business outcomes, from faster time-to-market and stronger compliance to deeper customer engagement. Still, adoption is uneven, and the benefits aren’t yet reaching the wider workforce.Early creative benefits, but a divide remainsNew Morning Consult and HP research shows nearly half of IT decision makers (45%) already use AI PCs for creative assistance, with almost a third (29%) using them for tasks like image generation and editing. That’s not just about efficiency — it’s about bringing imagination into everyday workflows.According to HP’s 2025 Work Relationship Index, fulfillment is the single biggest driver of a healthy work relationship, outranking even leadership. Give employees tools that let them create, not just execute tasks, and you unlock productivity, satisfaction, retention, and optimism. The same instinct that drives workers to build outside the office is the one companies can harness inside it.The challenge is that among broader knowledge workers, adoption is still low, just 29% for creative assistance and just 19% for image generation. This creative divide means the full potential of AI PCs hasn’t reached the wider workforce. For CIOs, the opportunity isn’t just deploying faster machines — it’s fostering a workplace culture where creativity drives measurable business value. Creative benefits of AI PCsSo when you put AI PCs in front of the employees who embrace the possibilities, what does that look like in practice? Early adopters are already seeing AI PCs reshape how creative work gets done. Teams dream up fresh ideas, faster. AI PCs can spark new perspectives and out-of-the-box solutions, enhancing human creativity rather than replacing it. With dedicated NPUs handling AI workloads, employees stay in flow without interruptions. Battery life is extended, latency drops, and performance improves — allowing teams to focus on ideas, not wait times.On-device AI is opening new creative mediums, from visual design to video production to music editing, and videos, photos, and presentations that can be generated, edited, and refined in real time. Plus, AI workloads like summarization, transcription, and code generation run instantly without relying on cloud APIs. That means employees can work productively in low-bandwidth or disconnected environments, removing downtime risks, especially for mobile workforces and global deployments.And across the organization, AI PCs mean real-world, measurable business outcomes. Marketing: AI PCs enable creative teams to generate ad variations, social content, and campaign assets in minutes instead of days, reducing dependence on external agencies. And that leads to faster campaign launches, reduced external vendor spend, and increased pipeline velocity.Product and engineering: Designers/engineers can prototype in CAD, generate 3D mockups, or run simulations locally with on-device AI accelerators, shortening feedback loops. That means reduced iteration cycles, faster prototyping, and faster time-to-market.Sales/customer engagement: Reps can use AI PCs to generate real-time proposals, personalized presentations, or analyze contracts offline at client sites, even without cloud connection. This generates faster deal cycles, higher client engagement, and a shorter sales turnaround.From efficiency to fulfillmentAI PCs are more than just a performance upgrade. They’re reshaping how people approach and experience work. By giving employees tools that spark creativity as well as productivity, organizations can unlock faster innovation, deeper engagement, and stronger retention. For CIOs, the opportunity goes beyond efficiency gains. The true value of AI PCs won’t be measured in speed or specs, but in how they open new possibilities for creation, collaboration, and competition — helping teams not just work faster, but work more creatively and productively.Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

Claude Code web, AI poisoning, DoctorGPT, WhatsApp ban, and more...

#ai

Vibe coding is evolving and with it are the leading AI-powered coding services and tools, including Anthropic’s Claude Code. As of today, the service will be available via the web and, in preview, on the Claude iOS app, giving developers access to additional asynchronous capabilities. Previously, it was available through the terminal on developers' PCs with support for Git, Docker, Kubernetes, npm, pip, AWS CLI, etc., and as an extension for Microsoft's open source VS Code editor and other JetBrains-powered integrated development environments (IDEs) via Claude Agent. “Claude Code on the web lets you kick off coding sessions without opening your terminal,” Anthropic said in a blog post. “Connect your GitHub repositories, describe what you need, and Claude handles the implementation. Each session runs in its own isolated environment with real-time progress tracking, and you can actively steer Claude to adjust course as it’s working through tasks.”This allows users to run coding projects asynchronously, a trend that many enterprises are looking for. The web version of Claude Code, currently in research preview, will be available to Pro and Max users. However, web Claude Code will be subject to the same rate limits as other versions. Anthropic throttled rate limits to Claude and Claude Code after the unexpected popularity of the coding tool in July, which enabled some users to run Claude Code overnight. Anthropic is now ensuring Claude Code comes closer to matching the availability of rival OpenAI's Codex AI coding platform, powered by a variant of GPT-5, which launches on mobile and the web back in mid September 2025.Parallel usageAnthropic said running Claude Code in the cloud means teams can “now run multiple tasks in parallel across different repositories from a single interface and ship faster with automatic PR creation and clear change summaries.”One of the big draws of coding agents is giving developers the ability to run multiple coding projects, such as bugfixes, at the same time. Google’s two coding agents, Jules and Code Assist, both offer asynchronous code generation and checks. Codex from OpenAI also lets people work in parallel. Anthropic said bringing Claude Code to the web won’t disrupt workflows, but noted running tasks in the cloud work best for tasks such as answering questions around projects and how repositories are mapped, bugfixes and for routine, well-defined tasks, and backend changes to verify any adjustments. While most developers will likely prefer to use Claude Code on a desktop, Anthropic said the mobile version could encourage more users to “explore coding with Claude on the go.”Isolated environments Anthropic insisted that Claude Code tasks on the cloud will have the same level of security as the earlier version. It runs on an “isolated sandbox environment with network and filesystem restrictions.” Interactions go through a secure proxy service, which the company said ensures the model only accesses authorized repositories.Enterprise users can customize which domains Claude Code can connect to. Claude Code is powered by Claude Sonnet 4.5, which Anthropic claims is the best coding model around. The company recently made Claude Haiku 4.5, a smaller version of Claude that also has strong coding capabilities, available to all Claude subscribers, including free users. 

#ai

Hoping to attract more enterprise teams to its ecosystem, Adobe launched a new model customization service called Adobe AI Foundry, which would create bespoke versions of its flagship AI model, Firefly.Adobe AI Foundry will work with enterprise customers to rearchitect and retrain Firefly models specific to the client. AI Foundry version models are different from custom Firefly models in that Foundry models understand multiple concepts compared to custom models with only a single concept. These models will also be multimodal, offering a wider use case than custom Firefly models, which can only ingest and respond with images. Adobe AI Foundry models, with Firefly at its base, will know a company’s brand tone, image and video style, products and services and all its IP. The models will generate content based on this information for any use case the company wants. Hannah Elsakr, vice president, GenAI New Business Ventures at Adobe, told VentureBeat that the idea to set up AI Foundry came because enterprise customers wanted more sophisticated custom versions of Firefly. But with how complex the needs of enterprises are, Adobe will be doing the rearchitecting rather than handing the reins over to customers. “We will retrain our own Firefly commercially safe models with the enterprise IP. We keep that IP separate. We never take that back into the base model, and the enterprise itself owns that output,” Elsakr said. Adobe will deploy the Foundry version of Firefly through its API solution, Firefly Services. Elsakr likened AI Foundry to an advisory service, since Adobe will have teams working directly with enterprise customers to retrain the model. Deep tuningElsakr refers to Foundry as a deep tuning method because it goes further than simply fine-tuning a model.“The way we think about it, maybe more layman's terms, is that we're surgically reopening the Firefly-based models,” Elsakr said. “So you get the benefit of all the world's knowledge from our image model or a video model. We're going back in time and are bringing in the IP from the enterprise, like a brand. It could be footage from a shot style, whatever they have a license to contribute. We then retrain. We call this continuous pre-training, where we overweigh the model to dial some things differently. So we're literally retraining our base model, and that's why we call it deep tuning instead of fine-tuning.”Part of the training pipeline involves Adobe’s embedded teams working with the company to identify the data they would need. Then the data is securely transferred and ingested before being tagged. It is fed to the base model, and then Adobe begins a pre-training model run. Elsakr maintains the Foundry versions of Firefly will not be small or distilled models. Often, the additional data from companies expands the parameters of Firefly.Two early customers of Adobe AI Foundry are Home Depot and Walt Disney Imagineering, the research and development arm of Disney for its theme parks. “We are always exploring innovative ways to enhance our customer experience and streamline our creative workflows. Adobe’s AI Foundry represents an exciting step forward in embracing cutting-edge technologies to deepen customer engagement and deliver impactful content across our digital channels,” said Molly Battin, senior vice president and chief marketing officer at The Home Depot.More customizationEnterprises often turn to fine-tuning and model customization to bring large language models with their vast external knowledge closer to their company’s needs. Fine-tuning also enables enterprise users to utilize models only in the context of their organization’s data, so the model doesn’t respond with text wholly unrelated to the business.Most organizations, however, do the fine-tuning themselves. They connect to the model’s API and begin retraining it to answer based on their ground truth or their preferences. Several methods for fine-tuning exist, including some that can be done with just a prompt. Other model providers also try to make it easier for their customers to fine-tune models, such as OpenAI with its o4-mini reasoning model. Elsakr said she expects some companies will have three versions of Firefly: the Foundry version for most projects, a custom Firefly for specific single-concept use cases, and the base Firefly because some teams want a model less encumbered by corporate knowledge. 

#ai & ml #commentary

In the modern enterprise, information is the new capital. While companies pour resources into artificial intelligence, many discover that technology, standing alone, delivers only expense, not transformation. The true engine of change lies not in the algorithm but in the hands and minds of the people who use it. The greatest asset an organization possesses […]

AI engineering has shifted from a futuristic niche to one of the most in-demand tech careers on the planet.

The takeover of AI, vibe working, transcription glasses, job impact, and more...

#ai #datadecisionmakers

As more companies quickly begin using gen AI, it’s important to avoid a big mistake that could impact its effectiveness: Proper onboarding. Companies spend time and money training new human workers to succeed, but when they use large language model (LLM) helpers, many treat them like simple tools that need no explanation. This isn't just a waste of resources; it's risky. Research shows that AI has advanced quickly from testing to actual use in 2024 to 2025, with almost a third of companies reporting a sharp increase in usage and acceptance from the previous year.Probabilistic systems need governance, not wishful thinkingUnlike traditional software, gen AI is probabilistic and adaptive. It learns from interaction, can drift as data or usage changes and operates in the gray zone between automation and agency. Treating it like static software ignores reality: Without monitoring and updates, models degrade and produce faulty outputs: A phenomenon widely known as model drift. Gen AI also lacks built-in organizational intelligence. A model trained on internet data may write a Shakespearean sonnet, but it won’t know your escalation paths and compliance constraints unless you teach it. Regulators and standards bodies have begun pushing guidance precisely because these systems behave dynamically and can hallucinate, mislead or leak data if left unchecked.The real-world costs of skipping onboardingWhen LLMs hallucinate, misinterpret tone, leak sensitive information or amplify bias, the costs are tangible.Misinformation and liability: A Canadian tribunal held Air Canada liable after its website chatbot gave a passenger incorrect policy information. The ruling made it clear that companies remain responsible for their AI agents’ statements.Embarrassing hallucinations: In 2025, a syndicated “summer reading list” carried by the Chicago Sun-Times and Philadelphia Inquirer recommended books that didn’t exist; the writer had used AI without adequate verification, prompting retractions and firings.Bias at scale: The Equal Employment Opportunity Commission (EEOCs) first AI-discrimination settlement involved a recruiting algorithm that auto-rejected older applicants, underscoring how unmonitored systems can amplify bias and create legal risk.Data leakage: After employees pasted sensitive code into ChatGPT, Samsung temporarily banned public gen AI tools on corporate devices — an avoidable misstep with better policy and training.The message is simple: Un-onboarded AI and un-governed usage create legal, security and reputational exposure.Treat AI agents like new hiresEnterprises should onboard AI agents as deliberately as they onboard people — with job descriptions, training curricula, feedback loops and performance reviews. This is a cross-functional effort across data science, security, compliance, design, HR and the end users who will work with the system daily.Role definition. Spell out scope, inputs/outputs, escalation paths and acceptable failure modes. A legal copilot, for instance, can summarize contracts and surface risky clauses, but should avoid final legal judgments and must escalate edge cases.Contextual training. Fine-tuning has its place, but for many teams, retrieval-augmented generation (RAG) and tool adapters are safer, cheaper and more auditable. RAG keeps models grounded in your latest, vetted knowledge (docs, policies, knowledge bases), reducing hallucinations and improving traceability. Emerging Model Context Protocol (MCP) integrations make it easier to connect copilots to enterprise systems in a controlled way — bridging models with tools and data while preserving separation of concerns. Salesforce’s Einstein Trust Layer illustrates how vendors are formalizing secure grounding, masking, and audit controls for enterprise AI.Simulation before production. Don’t let your AI’s first “training” be with real customers. Build high-fidelity sandboxes and stress-test tone, reasoning and edge cases — then evaluate with human graders. Morgan Stanley built an evaluation regimen for its GPT-4 assistant, having advisors and prompt engineers grade answers and refine prompts before broad rollout. The result: >98% adoption among advisor teams once quality thresholds were met. Vendors are also moving to simulation: Salesforce recently highlighted digital-twin testing to rehearse agents safely against realistic scenarios. 4) Cross-functional mentorship. Treat early usage as a two-way learning loop: Domain experts and front-line users give feedback on tone, correctness and usefulness; security and compliance teams enforce boundaries and red lines; designers shape frictionless UIs that encourage proper use.Feedback loops and performance reviews—foreverOnboarding doesn’t end at go-live. The most meaningful learning begins after deployment.Monitoring and observability: Log outputs, track KPIs (accuracy, satisfaction, escalation rates) and watch for degradation. Cloud providers now ship observability/evaluation tooling to help teams detect drift and regressions in production, especially for RAG systems whose knowledge changes over time.User feedback channels. Provide in-product flagging and structured review queues so humans can coach the model — then close the loop by feeding these signals into prompts, RAG sources or fine-tuning sets.Regular audits. Schedule alignment checks, factual audits and safety evaluations. Microsoft’s enterprise responsible-AI playbooks, for instance, emphasize governance and staged rollouts with executive visibility and clear guardrails.Succession planning for models. As laws, products and models evolve, plan upgrades and retirement the way you would plan people transitions — run overlap tests and port institutional knowledge (prompts, eval sets, retrieval sources).Why this is urgent nowGen AI is no longer an “innovation shelf” project — it’s embedded in CRMs, support desks, analytics pipelines and executive workflows. Banks like Morgan Stanley and Bank of America are focusing AI on internal copilot use cases to boost employee efficiency while constraining customer-facing risk, an approach that hinges on structured onboarding and careful scoping. Meanwhile, security leaders say gen AI is everywhere, yet one-third of adopters haven’t implemented basic risk mitigations, a gap that invites shadow AI and data exposure.The AI-native workforce also expects better: Transparency, traceability, and the ability to shape the tools they use. Organizations that provide this — through training, clear UX affordances and responsive product teams — see faster adoption and fewer workarounds. When users trust a copilot, they use it; when they don’t, they bypass it.As onboarding matures, expect to see AI enablement managers and PromptOps specialists in more org charts, curating prompts, managing retrieval sources, running eval suites and coordinating cross-functional updates. Microsoft’s internal Copilot rollout points to this operational discipline: Centers of excellence, governance templates and executive-ready deployment playbooks. These practitioners are the “teachers” who keep AI aligned with fast-moving business goals.A practical onboarding checklistIf you’re introducing (or rescuing) an enterprise copilot, start here:Write the job description. Scope, inputs/outputs, tone, red lines, escalation rules.Ground the model. Implement RAG (and/or MCP-style adapters) to connect to authoritative, access-controlled sources; prefer dynamic grounding over broad fine-tuning where possible.Build the simulator. Create scripted and seeded scenarios; measure accuracy, coverage, tone, safety; require human sign-offs to graduate stages.Ship with guardrails. DLP, data masking, content filters and audit trails (see vendor trust layers and responsible-AI standards).Instrument feedback. In-product flagging, analytics and dashboards; schedule weekly triage.Review and retrain. Monthly alignment checks, quarterly factual audits and planned model upgrades — with side-by-side A/Bs to prevent regressions.In a future where every employee has an AI teammate, the organizations that take onboarding seriously will move faster, safer and with greater purpose. Gen AI doesn’t just need data or compute; it needs guidance, goals, and growth plans. Treating AI systems as teachable, improvable and accountable team members turns hype into habitual value.Dhyey Mavani is accelerating generative AI at LinkedIn.

Smartphones reimagined, OpenAI vid pause, cancer AI, history of AI, and more...

« 1...2829303132...191»
×