Context architecture is replacing RAG as agent AI pushes enterprise recovery to its limits



Redis became famous as the caching layer that prevented web applications from crashing under load. The problem you face now has the same structure, but is harder to solve: AI production agents fail not because the models are incorrect, but because the data they contain is sparse, outdated, and structured for humans rather than machines. Recovery channels created for one-time queries cannot absorb the volume generated by agents.

The gap Redis addresses is structural: agents make orders of magnitude more data requests than human users, but most of the recovery layers were built for the human-scale problem. Redis Iris, launched Monday, is the company’s answer: a context and memory platform that sits between an agent and the data it needs to act. The platform combines real-time data ingestion, a semantic interface that automatically generates MCP tools from commercial data models, and an agent memory server built on Redis Flex, a rewritten storage engine that runs 99% of data on flash at one-tenth the cost of in-memory storage alone.

The announcement comes as the enterprise RAG infrastructure is in active transition. VentureBeat’s Q1 2026 VB Pulse RAG Infrastructure Market Tracker found that buyers’ intent to adopt hybrid recovery tripled from 10.3% to 33.3% between January and March. Recovery optimization surpassed assessment as a top business investment priority for the first time. Custom internal recovery stacks increased from 24.1% to 35.6% as companies outgrew available options on the market. Redis isn’t the only infrastructure provider reading those signals: several data platform providers have repositioned themselves around agent context layers in recent weeks.

The scale mismatch is the structural argument behind the release.

"Companies will have orders of magnitude more agents than human beings," Rowan Trollope, CEO of Redis, said VentureBeat. "Orders of magnitude more agents than humans means orders of magnitude more load on back-end systems."

From cache to context

Trollope traces the parallel to the mobile era: When legacy backends built for branch tellers suddenly had to serve a million smartphone users, Redis became the caching layer that absorbed the load without a complete rebuild.

The difference this time is that agents cannot write their own middleware. In the mobile era, a developer would sit down with a database administrator, identify the queries an application needed, and code the caching logic in a middleware layer. Agents can’t do that. They need to find the right data at runtime, through interfaces created for them beforehand, or they will get stuck.

"This is like the supermarket analogy in the refrigerator," said. "If every time you have to go prepare your sandwich you have to run to the supermarket to look for food, that is not very efficient. You put a refrigerator in every house and store some food there. And that’s where we still tend to exist in the infrastructure stack."

What Redis Iris includes

Iris includes five components that together cover data ingestion, semantic access, memory, and caching.

Redis data integration. Now in general availability. RDI uses change data capture pipelines to synchronize data from relational databases, warehouses, and document stores to Redis continuously, with connectors for Oracle, Snowflake, Databricks, and Postgres.

Context retriever. Now in preview. Developers define a semantic model of business data using pydantic models, and Redis automatically generates MCP tools that agents use to query them directly, with row-level access controls applied on the server side. Trollope describes the change from the classic RAG as a directional reversal. "It’s just a change to allow the agent to pull the data instead of presupposing it and putting it in the pipeline," said.

Agent memory. Now in preview. Stores short- and long-term state across sessions so agents can carry context without re-derivating it every turn.

RedisFlex. A rewritten storage engine that runs 99% of data on SSD and 1% on RAM, delivering petabyte-scale recovery with sub-millisecond latencies.

Redis and LangCache search. The semantic caching and retrieval backbone under the platform. LangCache reduces redundant model calls by caching fast responses.

What analysts say

The data industry is now broadly heading in the same direction. Every major database vendor is making an argument about the context layer.

Traditional database providers including oracle They are integrating context and memory layers to bring relational databases into the era of agent AI. Providers of purpose-built vector databases, including Pineapple They are doing the same thing, building a new layer of knowledge for the AI ​​agent context. Independent context layers such as Hindsight understanding They are also part of the emerging landscape.

Trollope frames Redis’s position as structurally different from that competition.

"For us to win, no one else has to lose." said. Many Redis implementations already run MongoDB or Oracle as the backend system of record. Iris mirrors and caches those systems rather than displacing them. Redis is launching Iris on the Snowflake market with native connectors.

Stephanie Walter, AI Stack Practice Leader at HyperFRAME Research, clearly explains the market context. "The market is converging on the same conclusion: agents don’t just need more tokens or better models. They need a governed, current, low-latency context," Walter said.

His read on Redis differentiation focuses on where Redis already is in the stack, which is close to runtime, latency-sensitive operational state, and real-time data.

"The pitch isn’t so much “better RAG” as “agents need live context, memory, and quick recall while they’re working.”" she said.

Whether Redis or another vendor, every context layer technology will face a governance challenge to succeed.

"Agent AI will not scale in the enterprise if each agent becomes a new cost center, a new data access risk, and a new governance exception." she said. "The winning context layers will be those that make agents faster, cheaper, and safer to run."

For real-time clinical AI, getting context wrong is not an option

Mangoes.ai is a company that has already had to answer those questions in production, in conditions where the cost of misunderstanding context is measured in patient outcomes.

Amit Lamba, founder and CEO of Mangoes.ai, manages a real-time voice AI platform deployed in large healthcare facilities where patients and doctors ask live questions about treatment, scheduling, and case history. Mangoes.ai built its stack natively in Redis from the beginning.

"Recovery, memory, and session state all run through Redis, so we don’t bundle separate tools together and expect them to talk to each other." Lamba said.

The problem that Iris’s dynamic memory capability addresses is what happens in a complex session.

"Think about a one-hour group therapy session," Lamba said. "You need to know who said what, when, and be able to provide the correct information to the therapist in the moment. That’s not a simple recovery problem."

The platform runs multiple specialized agents in parallel, one for entity identification, one for relationship reasoning, and one for integrating case history.

"The dynamic memory capacity adapts almost perfectly to the problem we are solving," Lamba said.

What this means for businesses

For companies that built their AI stack around RAG, the recovery layer that got them into production is no longer enough to keep them there.

The RAG era is giving way to contextual architecture. The classic RAG model inserted data into the agent before calling the model. Production deployments are changing that: Agents pull what they need at runtime via tool calls, treating the data layer as a live resource rather than a preloaded payload. Teams still optimizing RAG pipelines are solving last year’s problem.

The semantic layer is now the production infrastructure. The model that defines business entities, their relationships, and the access rules between them must be built, versioned, and maintained with the same discipline as a data pipeline. Most organizations do not have the staff or structure for that work. Companies that define their contextual architecture now are the ones that won’t have to rebuild it when agent workloads increase.

The budget is already moving. VB Pulse Q1 2026 data shows that investment in recovery optimization increased from 19% to 28.9% during the quarter, surpassing spending on evaluation for the first time. Organizations that spent the previous year measuring the quality of their recovery are now investing in fixing it. The context layer is an active procurement decision, not a roadmap item.

"The buyer’s first question should not be: “Do I need a vector database, a long context, memory or a context engine?” It should be: “What does this agent need to know, how up-to-date does that knowledge need to be, who can access it, and how much does each retrieval cost?”" Walter said.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *