Agents need vector search more than RAG



What is the role of vector databases in the world of agent AI? That’s an issue organizations have been grappling with in recent months. The narrative had real momentum. As large language models scaled to multi-million-token context windows, a credible argument circulated among enterprise architects: purpose-built vector search was an interim solution, not an infrastructure. Agent memory would absorb the problem of retrieval. Vector databases were an artifact of the RAG era.

The production evidence goes in the opposite direction.

QdrantThe Berlin-based open source vector search company announced a $50 million Series B on Thursday, two years after a $28 million Series A. The moment is not coincidental. The company also ships version 1.17 of its platform. Together, they reflect a specific argument: The recovery problem did not abate when agents arrived. It increased and became more difficult.

"Humans make some queries every few minutes," Andre Zayarni, CEO and co-founder of Qdrant, told VentureBeat. "Agents make hundreds or even thousands of queries per second, simply gathering information to make decisions."

That change changes infrastructure requirements in a way that RAG-era implementations were never designed to handle.

Why agents need a recovery layer that memory can’t replace

Agents operate with information they were never trained to use: proprietary business data, current information, millions of documents that continually change. Context windows manage session state. They do not provide a high-retrieval search through that data, they do not maintain the quality of the retrieval as it changes, nor do they sustain the query volumes that autonomous decision making generates.

"Most AI memory frameworks out there use some form of vector storage," Zayarni said.

The implication is direct: even tools positioned as memory alternatives depend on the underlying recovery infrastructure.

Three failure modes arise when that recovery layer is not designed specifically for the load. At the document scale, a missing result is not a latency problem: it is a decision quality problem that is compounded by each retrieval step in a single agent turn. Under write load, relevance degrades because newly ingested data is placed in unoptimized segments before indexing catches up, making searches on the most recent data slower and less accurate precisely when current information is most important. Across distributed infrastructure, a single slow replica drives latency on every parallel tool call in an agent turn—a delay that a human user absorbs as an inconvenience, but an autonomous agent cannot.

Qdrant version 1.17 addresses each of them directly. A relevance feedback query improves retrieval by adjusting the similarity score in the next retrieval step using lightweight signals generated by the model, without retraining the integrated model. A delayed dispatch function queries a second replica when the first exceeds a configurable latency threshold. A new cluster-wide telemetry API replaces node-by-node troubleshooting with a single view of the entire cluster.

Why Qdrant no longer wants to be called a vector database

Almost all major databases now support vectors as a data type, from hyperscalers to traditional relational systems. That change has changed the competitive issue. The data type is now something that is in play. What remains specialized is the quality of recovery at production scale.

That distinction is why Zayarni no longer wants Qdrant to be called a vector database.

"We are building an information retrieval layer for the AI ​​era." said. "Databases are used to store user data. If the quality of search results is important, you need a search engine."

His advice for teams starting out: use whatever vector support is already in your stack. Teams that migrate to purpose-built recovery do so when scale forces the issue.

"We see companies coming to us every day saying they started with Postgres and they thought it was good enough, and it’s not."

Qdrant’s architecture, written in Rust, gives you memory efficiency and low-level performance control that higher-level languages ​​don’t match at the same cost. The open source foundation compounds that advantage: community feedback and developer adoption are what allow a company of Qdrant’s scale to compete with vendors that have much greater engineering resources.

"Without it, we wouldn’t be where we are now at all," Zayarni said.

How two production teams found the limits of general-purpose databases

Companies building AI production systems on Qdrant are making the same argument from different directions: agents need a retrieval layer, and conversational or contextual memory is no substitute for it.

GlassDollar helps companies like Siemens and Mahle evaluate startups. Search is the core product: a user describes a need in natural language and gets a ranked list from a corpus of millions of companies. The architecture performs query expansion on every request: a single message is deployed into multiple parallel queries, each of which retrieves candidates from a different angle, before the results are combined and re-ranked. This is an agent recovery pattern, not a RAG pattern, and requires a purpose-built search infrastructure to maintain it at volume.

The company migrated from Elasticsearch as it grew toward 10 million indexed documents. After moving to Qdrant, it reduced infrastructure costs by approximately 40%, removed a keyword-based compensation layer it had maintained to compensate for Elasticsearch relevance gaps, and saw a 3x increase in user engagement.

"We measure success by remembering," Kamen Kanev, head of product at GlassDollar, told VentureBeat. "If the best companies don’t appear in the results, nothing else matters. The user loses trust."

The agent memory and extended context windows are also not sufficient to absorb the workload that GlassDollar needs.

"That’s an infrastructure problem, not a conversation state management task," said Kanev. "It is not something that is solved by extending a context window."

Another Qdrant user is &AIwhich is building infrastructure for patent litigation. Its AI agent, Andy, performs semantic searches across hundreds of millions of documents spanning decades and multiple jurisdictions. Patent attorneys will not act on AI-generated legal text, meaning that every result the agent presents must be based on a real document.

"Our entire architecture is designed to minimize the risk of hallucinations by making recovery the primitive core, not generation," Herbie Turner, founder and CTO of &AI, told VentureBeat.

For &AI, the agent layer and the recovery layer are different by design.

"Andy, our patent agent, is built on Qdrant," Turner said. "The agent is the interface. The vector database is the fundamental truth."

Three Signs It’s Time to Leave Your Current Settings

The practical starting point: use any vector capabilities that are already in your stack. The evaluation question is not whether to add vector search, but when your current configuration is no longer adequate. Three signals mark that point: the quality of the recovery is directly related to trading results; query patterns involve expansion, multi-stage reclassification, or calls to parallel tools; or the data volume reaches tens of millions of documents.

At that point, the evaluation moves to operational questions: how much visibility your current configuration gives you into what’s happening in a distributed cluster and how much performance headroom you have when agent query volumes increase.

"There is a lot of noise right now about what replaces the recovery layer," said Kanev. "But anyone building a product where the quality of retrieval is the product, and where missing a result has real business consequences, needs a dedicated search infrastructure."



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *