Googlefocused launched yet another coding agent platform, this time focusing on developer teams collaborating to create agents that can execute complex tasks automatically.The platform, called Antigravity, is powered by Gemini 3 and is now available in public preview with “generous rate limits on Gemini 3 Pro usage.” Antigravity is an agentic coding platform that aims to “evolve the IDE towards an agent-first future with browser control capabilities, asynchronous interaction patterns, and an agent-first product form factor.” For the public preview, Antigravity users can build agents using Gemini 3, as well as other leading models such as Anthropic’s Sonnet 4.5 models and OpenAI, open-source GPT-oss. It will be compatible with developer environments running on major operating systems like macOS, Linux, and Windows. “We want Antigravity to be the home base for software development in the era of agents,” Google said in a blog post. “Our vision is to ultimately enable anyone with an idea to experience liftoff and build that idea into reality.”
AI decodes whale sounds, silent speech wearable, AI teddy danger, and more...
On this episode of Uncanny Valley, guest Brian Merchant walks us through a historical framework he used to analyze whether AI fits the classic signs of an economic bubble—and what that means for all of us.
Google’s deep investments in American technical infrastructure, R&D and the workforce will help the U.S. continue to lead the world in AI.
Deep learning is often seen as a black box. We know that it learns from data, but we rarely stop to ask how it truly learns.
What if we could open that box and watch each step happen right before our eyes?
With Excel, we can do exactly that, see how numbers turn into patterns, and how simple calculations become the foundation of what we call “deep learning.”
In this article, we will build a tiny Convolutional Neural Network (CNN) directly in Excel to understand, step by step, how machines detect shapes, patterns, and meaning in images.
The post Understanding Convolutional Neural Networks (CNNs) Through Excel appeared first on Towards Data Science.
In part 1, we showed how we could leverage HTMX to add interactivity to our HTML elements. In other words, Javascript without Javascript. To illustrate that, we began building a simple chat that would return a simulated LLM response. In this article, we will extend the capabilities of our chatbot and add several features, among […]
The post Javascript Fatigue: HTMX Is All You Need to Build ChatGPT — Part 2 appeared first on Towards Data Science.
This year, re:Invent will be held in Las Vegas, Nevada, from December 1 to December 5, 2025, and this guide will help you navigate our comprehensive session catalog and plan your week. The sessions cater to business and technology leaders, product and engineering teams, and data and analytics teams interested in incorporating agentic AI capabilities across their teams and organization.
I'm excited to announce AWS Professional Services now offers specialized AI agents including the AWS Professional Services Delivery Agent. This represents a transformation to the consulting experience that embeds intelligent agents throughout the consulting life cycle to deliver better value for customers.
In this post, we explore how Amazon Bedrock AgentCore and Claude are enabling enterprises like Cox Automotive and Druva to deploy production-ready agentic AI systems that deliver measurable business value, with results including up to 63% autonomous issue resolution and 58% faster response times. We examine the technical foundation combining Claude's frontier AI capabilities with AgentCore's enterprise-grade infrastructure that allows organizations to focus on agent logic rather than building complex operational systems from scratch.
Why you should not explain your time-series data with tabular Shapley methods
The post Introducing ShaTS: A Shapley-Based Method for Time-Series Models appeared first on Towards Data Science.
Use new Google AI tools to get help with itineraries, finding flight deals and booking reservations.
Welcome back to The State of AI, a new collaboration between the Financial Times and MIT Technology Review. Every Monday, writers from both publications debate one aspect of the generative AI revolution reshaping global power. In this conversation, Helen Warrell, FT investigations reporter and former defense and security editor, and James O’Donnell, MIT Technology Review’s…
The new AI model delivers more efficient, more accurate and higher-resolution global weather predictions.
This step-by-step tutorial walks you through building your own RAG system.
The new AI model delivers more efficient, more accurate and higher-resolution global weather predictions.
Learn how to initialize dataframes from dictionaries, lists, and NumPy arrays
The post The Absolute Beginner’s Guide to Pandas DataFrames appeared first on Towards Data Science.
Hands-on projects to help beginners understand how machines read, understand, generate, and translate human language.
Building a chatbot (almost) without Javascript, only with Python and HTML.
The post Javascript Fatigue: HTMX Is All You Need to Build ChatGPT — Part 1 appeared first on Towards Data Science.
Headlines surfaced by a simple “job market” search describe it as “a humiliation ritual” or “hell” and “an emerging crisis for entry-level workers.” The unemployment rate in the US for recent graduates is at an “unusually high” 5.8%—even Harvard Business School graduates have been taking months to find work. Inextricable from this conversation is the […]
You've learned about
As OpenAI expands in every direction, the new CEO of Applications is on a mission to make ChatGPT indispensable and lucrative.
New funding from Google will help launch a regional Data Commons for Africa.
AI engineers often chase performance by scaling up LLM parameters and data, but the trend toward smaller, more efficient, and better-focused models has accelerated. The Phi-4 fine-tuning methodology is the cleanest public example of a training approach that smaller enterprise teams can copy. It shows how a carefully chosen dataset and fine-tuning strategy can make a 14B model compete with much larger ones.The Phi-4 model was trained on just 1.4 million carefully chosen prompt-response pairs. Instead of brute force, the Microsoft Phi-4 research team focused on “teachable” examples at the edge of the model’s abilities and rigorous data curation. The Phi-4 reasoning smart data playbook demonstrates how strategic data curation with replicable SFT and RL can elevate a 14B model beyond much larger counterparts.Why Phi-4 stands apartSmaller reasoning models, such as OpenAI’s o1-mini and Google’s Gemma, are becoming more common, and models like Alibaba’s Qwen3 (8B and 14B) are seeing wide adoption across use cases. That adoption is important, but it doesn’t displace the value of Phi-4 as an experimental proof: Phi-4 was designed as a testbed for a data-first training methodology, and its documentation reads like a smart data playbook for teams that want to replicate that approach.The Phi-4 team has shared a repeatable SFT playbook that includes a 1.4-million-prompt response set. It’s built around “teachable” edge examples, questions that are neither too easy nor too difficult, chosen to push the model’s reasoning. Each topic, such as math or code, is tuned separately and then combined with synthetic rewrites that turn complex tasks into forms that can be checked automatically. The paper outlines the data selection and filtering process in enough detail for smaller teams to reproduce it with open-source models and evaluators. For enterprise teams, that level of transparency turns a research result into a practical, copyable training recipe they can implement and measure quickly.The data-first philosophy: Why less can be moreTraditional approaches to LLM reasoning have often relied on scaling datasets massively to encourage generalization. Phi-4 reasoning takes a different path, showing that carefully curated data can achieve similar or even better results with far less.The team assembled a dataset covering STEM, coding, and safety. Despite its small size, it outperformed models trained on orders of magnitude more data. In benchmarks, the 14B Phi-4 reasoning model outperformed OpenAI’s o1-mini and DeepSeek’s 70B distilled model across most reasoning tasks, and approached the full DeepSeek-R1 (671B) on challenging math (AIME) questions. With just 14 billion parameters, Phi-4 reasoning delivers the following results when compared to other leading models:Benchmark (task)Phi-4 reasoningComparison model (size)Comparison scoreDate / SourceAIME 2024 (math olympiad)75.3%o1-mini63.6%Microsoft Phi-4 model card (Apr 2025). (Hugging Face)AIME 2025 (math olympiad)62.9%DeepSeek-R1-Distill-70B51.5%Microsoft Phi-4 model card (April 2025). (Hugging Face)OmniMath76.6%DeepSeek-R1-Distill-70B63.4%Microsoft Phi-4 model card (April 2025). (Hugging Face)GPQA-Diamond (graduate-level science)65.8%o1-mini60.0%Microsoft Phi-4 model card (April 2025). (Hugging Face)OmniMath (same benchmark, different comparison)76.6%Claude-3.7-Sonnet54.6%Microsoft Phi-4 model card (April 2025). (Hugging Face)Table: Phi-4 reasoning performance across benchmarks compared to other models. Source: MicrosoftThe key to this is filtering for quality over quantity. Much of the generic data is either too easy (the base model already knows it) or too hard (no learning signal). The Phi-4 team explicitly discards such examples. “Given the strong baseline reasoning capabilities of Phi-4, many initial seed questions are already handled competently,” they note. “To make further learning impactful, we specifically target seeds situated at the edge of Phi-4’s current abilities.” In practice, they rely on LLM-based evaluation. For each candidate question, a strong reference model (like GPT-4) generates an “answer key,” and the answers from weaker models are compared. If the weaker model disagrees enough, it indicates a teachable gap. Those questions are retained, while trivially solved or utterly unsolvable questions are dropped. For example, a simple arithmetic problem might be dropped (too easy), and an extremely obscure theorem proof might be dropped (too hard) as well. But a moderately challenging geometry problem that Phi-4 gets wrong is included.This “sweet spot” approach ensures every example forces the model to stretch its reasoning. By focusing on multi-step problems rather than rote recall, they pack maximum learning into 1.4M examples. As the authors explain, training on these carefully chosen seeds “leads to broad generalization across both reasoning-specific and general-purpose tasks.” In effect, Phi-4 reasoning demonstrates that intelligent data selection can outperform brute force scaling. Independent domain optimizationPhi-4 reasoning’s data are grouped by domain (math, coding, puzzles, safety, etc.). Rather than blending everything at once, the team tunes each domain’s mix separately and then merges them. This relies on an “additive property”: Optimizing math data in isolation and code data in isolation yields weights that, when concatenated, still give gains in both areas. In practice, they first tuned the math dataset to saturation on math benchmarks, then did the same for code, and finally simply added the code data into the math recipe. The result was improved performance on both math and coding tasks, without retraining from scratch.This modular approach offers clear practical advantages. This means a small team can first refine just the math dataset, achieve strong math performance, and then later add the coding data without redoing the math tuning.However, the Phi-4 authors caution that scaling this method to many domains remains an open question. While the approach “worked very well” for their math+code mix, they note, “it is not known whether this method can scale to dozens or hundreds of domains,” a direction they acknowledge as a valuable area for future research. In short, the additive strategy is effective, but expanding into new domains must be approached carefully, as it may introduce unforeseen interactions.Despite potential pitfalls, the additive strategy proved effective in Phi-4 reasoning. By treating each domain independently, the team avoided complex joint optimization and narrowed the search space for data mixtures. This approach allows incremental scaling of domains. Teams can begin by tuning the math SFT, then incorporate the code dataset, and later expand to additional specialized tasks, all while maintaining prior performance gains. This is a practical advantage for resource-constrained teams. Instead of requiring a large group of experts to manage a complex, multi-domain dataset, a small team can focus on one data silo at a time.Synthetic data transformationSome reasoning problems, such as abstract proofs or creative tasks, are difficult to verify automatically. Yet automated verification (for RL reward shaping) is very valuable. Phi-4 reasoning tackled this by transforming hard prompts into easier-to-check forms. For example, the team rewrote a subset of coding problems as word puzzles or converted some math problems to have concise numeric answers. These “synthetic seed data” preserve the underlying reasoning challenge but make correctness easier to test. Think of it as giving the model a simplified version of the riddle that still teaches the same logic. This engineering hack enables downstream RL to use clear reward signals on tasks that would otherwise be too open-ended. Here’s an example of synthetic data transformation:Raw web dataSynthetic dataOn the sides AB and BC of triangle ABC, points M and N are taken, respectively. It turns out that the perimeter of △AMC is equal to the perimeter of △CNA, and the perimeter of △ANB is equal to the perimeter of △CMB. Prove that △ABC is isosceles.ABC is a triangle with AB=13 and BC=10. On the sides AB and BC of triangle ABC, points M and N are taken, respectively. It turns out that the perimeter of △AMC is equal to the perimeter of △CNA, and the perimeter of △ANB is equal to the perimeter of △CMB. What is AC?Table: Rewriting seed data from the web (left) into verifiable synthetic questions for SFT and RL (right). Source: MicrosoftNote that by assigning numeric values (AB=13, BC=10) and asking “What is AC?”, the answer becomes a single number, which can be easily checked for correctness.Other teams have applied similar domain-specific tricks. For example, chemistry LLMs like FutureHouse’s ether0 model generate molecules under strict pKa or structural constraints, using crafted reward functions to ensure valid chemistry. In mathematics, the Kimina-Prover model by Numina translates natural-language theorems into the Lean formal system, so reinforcement learning can verify correct proofs. These examples highlight how synthetic augmentation, when paired with verifiable constraints, can push models to perform well in highly specialized domains.In practical terms, engineers should embrace synthetic data but keep it grounded. Heuristics like “convert to numeric answers” or “decompose a proof into checkable steps” can make training safer and more efficient. At the same time, maintain a pipeline of real (organic) problems as well, to ensure breadth. The key is balance. Use synthetic transformations to unlock difficult verification problems, but don’t rely on them exclusively. Real-world diversity still matters. Following this approach, the model is guided toward a clearly defined, discrete objective.Here are some results on Phi-4 reasoning models:Practical implementation for enterprisesAI teams looking to apply Phi-4 reasoning’s insights can follow a series of concrete steps to implement the approach effectively.Identifying the model’s edgeDetect your model’s “edge” by identifying where the base LLM struggles. One way is to use its confidence or agreement scores. For example, generate several answers per prompt (using a tool like Hugging Face’s vLLM for fast sampling) and see where consensus breaks. Those prompts at the margin of confidence are your teachable examples. By focusing on these low-confidence questions rather than the questions it already gets right, you ensure each new example is worth learning.Isolating domains for targeted tuningTune one domain at a time rather than mixing all data genres upfront. Pick the highest-value domain for your app (math, code, legal, etc.) and craft a small SFT dataset for just that. Iterate on the mix (balancing difficulty, source types, etc.) until performance saturates on domain-specific benchmarks. Then freeze that mix and add the next domain. This modular tuning follows Phi-4 reasoning’s “additive” strategy. It avoids cross-talk since you preserve gains in domain A even as you improve domain B.Expanding with synthetic augmentationLeverage synthetic augmentation when gold-standard answers are scarce or unverifiable. For instance, if you need to teach a proof assistant but can’t autocheck proofs, transform them into arithmetic puzzles or shorter proofs that can be verified. Use your LLM to rewrite or generate these variants (Phi-4 used this to turn complex word problems into numeric ones). Synthetic augmentation also lets you expand data cheaply. Once you have a validated small set, you can “multiply” it by having the LLM generate paraphrases, variations, or intermediate reasoning steps.Scaling through a two-phase strategyUse a two-phase training strategy that begins with exploration followed by scaling. In Phase 1 (exploration), run short fine-tuning experiments on a focused dataset (e.g., one domain) with limited compute. Track a few key metrics (benchmarks or held-out tasks) each run. Rapidly iterate hyperparameters and data mixes. The Phi-4 paper demonstrates that this speeds up progress, as small experiments helped the team discover a robust recipe before scaling up. Only once you see consistent gains do you move to Phase 2 (scaling), where you combine your verified recipes across domains and train longer (in Phi-4’s case, ~16 billion tokens). Although this stage is more compute-intensive, the risk is significantly reduced by the prior experimentation.Monitor for trigger points such as a significant uplift on validation tasks or stable metric trends. When those appear, it’s time to scale. If not, refine the recipe more first. This disciplined two-phase loop saves resources and keeps the team agile.In practice, many teams at Hugging Face and elsewhere have followed similar advice. For example, while developing conversational model SmolLM2, the team noticed poor chat performance in Phase 1. They then generated ~500K synthetic multi-turn dialogues and re-trained, which “significantly improved both downstream performance and its overall ‘vibes,’” as one researcher reports. This represents a concrete win, achieved through a targeted synthetic data injection based on an initial feedback loop.How to do this nowHere’s a simple checklist that you can follow to put these ideas into action.Pick a target domain/task. Choose one area (e.g., math, coding, or a specific application) where you need better performance. This keeps the project focused.Collect a small seed dataset. Gather, say, a few thousand prompt–answer pairs in that domain from existing sources (textbooks, GitHub, etc.).Filter for edge-of-ability examples. Use a strong model (e.g., GPT-4) to create an answer key for each prompt. Run your base model on those prompts. Keep examples that the base model often misses, discard ones it already solves or is hopeless on. This yields “teachable” examples.Fine-tune your model (Phase 1). Run a short SFT job on this curated data. Track performance on a held-out set or benchmark. Iterate: Refine the data mix, remove easy questions, add new teachable ones, until gains taper off.Add synthetic examples if needed. If some concepts lack auto-verifiable answers (like long proofs), create simpler numeric or single-answer variants using your LLM. This gives clear rewards for RL. Keep a balance with real problems.Expand to the next domain. Once one domain is tuned, “freeze” its dataset. Pick a second high-value domain and repeat steps 3 to 5 to tune that data mix. Finally, merge the data for both domains, and do a final longer training run (Phase 2).Monitor benchmarks carefully. Use a consistent evaluation methodology (like majority-voting runs) to avoid misleading results. Only proceed to a full-scale training if small experiments show clear improvements.Limits and trade-offsDespite the effectiveness of the Phi-4 training method, several limitations and practical considerations remain. One key challenge is domain scaling. While Phi-4’s additive method worked well for math and code, it has yet to be proven across many domains. The authors acknowledge that it remains an open question whether this approach can scale smoothly to dozens of topics. Another concern is the use of synthetic data. Relying too heavily on synthetic rewrites can reduce the diversity of the dataset, so it’s crucial to maintain a balance between real and synthetic examples to preserve the model's ability to reason effectively. Lastly, while the repeatable SFT method helps reduce computational costs, it doesn’t eliminate the need for thoughtful curation. Even though the approach is more efficient than brute-force scaling, it still requires careful data selection and iteration.Lessons from Phi-4The Phi-4 reasoning story is clear: Bigger isn’t always better for reasoning models. Instead of blindly scaling, the team asked where learning happens and engineered their data to hit that sweet spot. They show that “the benefit of careful data curation for supervised fine-tuning extends to reasoning models.” In other words, with a smart curriculum, you can squeeze surprising capability out of modest models.For engineers, the takeaway is actionable. You don’t need a billion-dollar cluster or an endless internet crawl to improve reasoning. For resource-strapped teams, this is good news, as a careful data strategy lets you punch above your weight.Phi-4 reasoning proves that methodical data and training design, not sheer parameter count, drives advanced reasoning. Focusing on teachable data and iterative tuning, even a 14B model surpassed much larger rivals. For AI teams today, this offers a practical blueprint. Refine the data, iterate fast, and scale only when the signals are right. These steps can unlock breakthrough reasoning performance without breaking the bank.
Despite new methods emerging, enterprises continue to turn to autonomous coding agents and code generation platforms. The competition to keep developers working on their platforms, coming from tech companies, has also heated up.AWS thinks its offering, Kiro, and new capabilities to ensure behavioral adherence set up a large differentiator in the increasingly crowded coding agent space. Kiro, first launched in July on public preview, is now generally available with new features, including property-based testing for behavior and a command-line interface (CLI) capability to tailor custom agents.Deepak Singh, AWS vice president for databases and AI, told VentureBeat in an interview that Kiro “keeps the fun” of coding while providing it structure.“The way I like to say it is, what Kiro does is it allows you to talk to your agent and work with your agent to build software just like you would do with any other agent,” Singh said. “But what Kiro does is it brings this structured way of writing that software, which we call spectrum and development, to specs that take your ideas, converts them into things that will endure over time. So the outcome is more robust, maintainable code.”Kiro is an agentic coding tool built into developer IDEs to help create agents and applications from prototype to production.In addition to new features, AWS is offering startups in most countries one year of free credits to Kiro Pro+ and expanded access to Teams. Behavioral adherence and checkpointing built inOne of the new features of Kiro is property-based testing and checkpointing. A problem some enterprises face with AI-generated code is that it can sometimes be difficult to judge accuracy and how closely the agents adhere to their intended purpose. AWS noted in a blog post that “whoever writes the tests (human or AI) is limited by their own biases— they have to think of all the different, specific scenarios to test the code against, and they’ll miss edge cases they didn’t think of. AI models often ‘game’ the solution by modifying tests instead of fixing code.”“What property-based testing does is it takes a specification, it takes a spec, and from that, it identifies properties your code should have, and it basically creates potentially hundreds of testing scenarios to verify that your code is doing what you intended it to as identified in the spec, and it does all the automatically,” Singh said. Singh said that organizations can upload their specifications, and the Kiro agent can start identifying what is missing, even before the code review process begins. Property-based testing matches the specified behavior, aka your instructions, to what the code is doing. Kiro can help users write it in their specifications based on the EARS format. For example, if a company is building a car sales app, the specification would read:“For any user and any car listing, WHEN the user adds the car to favorites, THE System SHALL display that car in their favorites list. PBT then automatically tests this with User A adding Car #1, User B adding Car #500, User C adding multiple cars, users with special characters in usernames, cars with various statuses (new, used, certified), and hundreds more combinations, catching edge cases and verifying that implementation matches your intent.”As opposed to a traditional unit test specification, which states: If a user adds car #5 to their favorites, then it will appear on their list.Kiro will then identify examples of the code violating the specifications and present them to the user. Kiro also now allows for checkpointing, so developers can go back to a previous change if something goes wrong.
CLI coding
The second major new feature of Kiro is Kiro CLI, which brings the Kiro coding agent directly into a developer’s CLI.AWS said the Kiro CLI utilizes some functionalities from the Q Developer CLI—its in-line coding assistant, launched in October 2024—to enable users to access the agent from the command line. It also allows developers to start building custom agents, such as a backend specialist, a frontend agent, and a DevOps agent, tailored to an organization’s codebase.Singh said developers have their own unique ways of working, so it’s important for coding agent providers like AWS to meet them, where they are. Kiro CLI allows users to:Stay in the terminal without the need for context switchingStructuring AI workflows with custom agentsHave one set up for two environments since MCP servers and other tools work in both the Kiro version on the IDE or the CLIFast automation to format code or manage logs through automated commands
Coding agents competition
Kiro, though, is just one of many coding agent platforms cropping up and competing for enterprise usage. From OpenAI’s GPT-Codex, which unifies its Codex coding assistant with IDEs, CLIs, and other workflows, to Google’s Gemini CLI, it's clear that more developers demand easy access to coding agents where they do their work. And enterprises are demanding more from coding agents. For example, Anthropic made its Claude Code platform available on the web and mobile. Some coding platforms also allow users to choose which model to use for their coding. Singh said Kiro doesn’t rely on just one LLM; instead, it routes to the best model for the work, including AWS models. At launch in July, Kiro was based on Claude Sonnet 3.7 and 4.0. Well-known brands like Monday.com have noted the significant benefits of AI-powered coding, demonstrating that enterprises will likely continue to utilize these platforms in the future. “We saw that the mental model changes for developers, but it’s not just about becoming more efficient; it’s also how they organize around the way they work now,” Singh said.
2026 AI timeline, eBay's AI surge, perfect math AI, 3D worlds, Ai voice, and more...
When I first wrote “Vector databases: Shiny object syndrome and the case of a missing unicorn” in March 2024, the industry was awash in hype. Vector databases were positioned as the next big thing — a must-have infrastructure layer for the gen AI era. Billions of venture dollars flowed, developers rushed to integrate embeddings into their pipelines and analysts breathlessly tracked funding rounds for Pinecone, Weaviate, Chroma, Milvus and a dozen others.The promise was intoxicating: Finally, a way to search by meaning rather than by brittle keywords. Just dump your enterprise knowledge into a vector store, connect an LLM and watch magic happen.Except the magic never fully materialized.Two years on, the reality check has arrived: 95% of organizations invested in gen AI initiatives are seeing zero measurable returns. And, many of the warnings I raised back then — about the limits of vectors, the crowded vendor landscape and the risks of treating vector databases as silver bullets — have played out almost exactly as predicted.Prediction 1: The missing unicornBack then, I questioned whether Pinecone — the poster child of the category — would achieve unicorn status or whether it would become the “missing unicorn” of the database world. Today, that question has been answered in the most telling way possible: Pinecone is reportedly exploring a sale, struggling to break out amid fierce competition and customer churn.Yes, Pinecone raised big rounds and signed marquee logos. But in practice, differentiation was thin. Open-source players like Milvus, Qdrant and Chroma undercut them on cost. Incumbents like Postgres (with pgVector) and Elasticsearch simply added vector support as a feature. And customers increasingly asked: “Why introduce a whole new database when my existing stack already does vectors well enough?”The result: Pinecone, once valued near a billion dollars, is now looking for a home. The missing unicorn indeed. In September 2025, Pinecone appointed Ash Ashutosh as CEO, with founder Edo Liberty moving to a chief scientist role. The timing is telling: The leadership change comes amid increasing pressure and questions over its long-term independence. Prediction 2: Vectors alone won’t cut itI also argued that vector databases by themselves were not an end solution. If your use case required exactness — l ike searching for “Error 221” in a manual—a pure vector search would gleefully serve up “Error 222” as “close enough.” Cute in a demo, catastrophic in production.That tension between similarity and relevance has proven fatal to the myth of vector databases as all-purpose engines. “Enterprises discovered the hard way that semantic ≠ correct.”Developers who gleefully swapped out lexical search for vectors quickly reintroduced… lexical search in conjunction with vectors. Teams that expected vectors to “just work” ended up bolting on metadata filtering, rerankers and hand-tuned rules. By 2025, the consensus is clear: Vectors are powerful, but only as part of a hybrid stack.Prediction 3: A crowded field becomes commoditizedThe explosion of vector database startups was never sustainable. Weaviate, Milvus (via Zilliz), Chroma, Vespa, Qdrant — each claimed subtle differentiators, but to most buyers they all did the same thing: store vectors and retrieve nearest neighbors.Today, very few of these players are breaking out. The market has fragmented, commoditized and in many ways been swallowed by incumbents. Vector search is now a checkbox feature in cloud data platforms, not a standalone moat.Just as I wrote then: Distinguishing one vector DB from another will pose an increasing challenge. That challenge has only grown harder. Vald, Marqo, LanceDB, PostgresSQL, MySQL HeatWave, Oracle 23c, Azure SQL, Cassandra, Redis, Neo4j, SingleStore, ElasticSearch, OpenSearch, Apahce Solr… the list goes on.The new reality: Hybrid and GraphRAGBut this isn’t just a story of decline — it’s a story of evolution. Out of the ashes of vector hype, new paradigms are emerging that combine the best of multiple approaches.Hybrid Search: Keyword + vector is now the default for serious applications. Companies learned that you need both precision and fuzziness, exactness and semantics. Tools like Apache Solr, Elasticsearch, pgVector and Pinecone’s own “cascading retrieval” embrace this.GraphRAG: The hottest buzzword of late 2024/2025 is GraphRAG — graph-enhanced retrieval augmented generation. By marrying vectors with knowledge graphs, GraphRAG encodes the relationships between entities that embeddings alone flatten away. The payoff is dramatic.Benchmarks and evidenceAmazon’s AI blog cites benchmarks from Lettria, where hybrid GraphRAG boosted answer correctness from ~50% to 80%-plus in test datasets across finance, healthcare, industry, and law. The GraphRAG-Bench benchmark (released May 2025) provides a rigorous evaluation of GraphRAG vs. vanilla RAG across reasoning tasks, multi-hop queries and domain challenges. An OpenReview evaluation of RAG vs GraphRAG found that each approach has strengths depending on task — but hybrid combinations often perform best. FalkorDB’s blog reports that when schema precision matters (structured domains), GraphRAG can outperform vector retrieval by a factor of ~3.4x on certain benchmarks. The rise of GraphRAG underscores the larger point: Retrieval is not about any single shiny object. It’s about building retrieval systems — layered, hybrid, context-aware pipelines that give LLMs the right information, with the right precision, at the right time.What this means going forwardThe verdict is in: Vector databases were never the miracle. They were a step — an important one — in the evolution of search and retrieval. But they are not, and never were, the endgame.The winners in this space won’t be those who sell vectors as a standalone database. They will be the ones who embed vector search into broader ecosystems — integrating graphs, metadata, rules and context engineering into cohesive platforms.In other words: The unicorn isn’t the vector database. The unicorn is the retrieval stack.Looking ahead: What’s nextUnified data platforms will subsume vector + graph: Expect major DB and cloud vendors to offer integrated retrieval stacks (vector + graph + full-text) as built-in capabilities.“Retrieval engineering” will emerge as a distinct discipline: Just as MLOps matured, so too will practices around embedding tuning, hybrid ranking and graph construction.Meta-models learning to query better: Future LLMs may learn to orchestrate which retrieval method to use per query, dynamically adjusting weighting.Temporal and multimodal GraphRAG: Already, researchers are extending GraphRAG to be time-aware (T-GRAG) and multimodally unified (e.g. connecting images, text, video).Open benchmarks and abstraction layers: Tools like BenchmarkQED (for RAG benchmarking) and GraphRAG-Bench will push the community toward fairer, comparably measured systems.From shiny objects to essential infrastructureThe arc of the vector database story has followed a classic path: A pervasive hype cycle, followed by introspection, correction and maturation. In 2025, vector search is no longer the shiny object everyone pursues blindly — it’s now a critical building block within a more sophisticated, multi-pronged retrieval architecture.The original warnings were right. Pure vector-based hopes often crash on the shoals of precision, relational complexity and enterprise constraints. Yet the technology was never wasted: It forced the industry to rethink retrieval, blending semantic, lexical and relational strategies.If I were to write a sequel in 2027, I suspect it would frame vector databases not as unicorns, but as legacy infrastructure — foundational, but eclipsed by smarter orchestration layers, adaptive retrieval controllers and AI systems that dynamically choose which retrieval tool fits the query.As of now, the real battle is not vector vs keyword — it’s the indirection, blending and discipline in building retrieval pipelines that reliably ground gen AI in facts and domain knowledge. That’s the unicorn we should be chasing now.Amit Verma is head of engineering and AI Labs at Neuron7. Read more from our guest writers. Or, consider submitting a post of your own! See our guidelines here.
What I learned about vibe coding, AI tools, and getting started as a solopreneur
The post I Built an IOS App in 3 Days with Literally No Prior Swift Knowledge appeared first on Towards Data Science.
Let's make conscious and deliberate choices when we use AI.
The post Stop Worrying about AGI: The Immediate Danger is Reduced General Intelligence (RGI) appeared first on Towards Data Science.
The AI’s latest trick is following the schedule you set for it.
Electrons can freeze into strange geometric crystals and then melt back into liquid-like motion under the right quantum conditions. Researchers identified how to tune these transitions and even discovered a bizarre “pinball” state where some electrons stay locked in place while others dart around freely. Their simulations help explain how these phases form and how they might be harnessed for advanced quantum technologies.