Databricks says it solved the decades-old data pipeline problem that has been holding back AI agents



For decades, data professionals have struggled with the challenge of managing operational and analytical databases with a unified approach that does not introduce latency or performance degradation.

The agents made the problem structural. A system that continually reasons and acts on live data cannot tolerate a channel between itself and the information it needs to act.

At the Data + AI Summit on Tuesday, Databricks announced two products aimed at crashing that infrastructure. Lakehouse//RT delivers millisecond query latency directly into governed Delta and Iceberg tables, eliminating the dedicated real-time service level that enterprises have maintained alongside their Lakehouses. LTAP, short for Lake Transactional/Analytical Processing, stores native Postgres transactional data in Delta and Iceberg formats from the point of writing, eliminating the ETL pipelines that have connected operational and analytical systems for decades.

Reynold Xin, co-founder of Databricks, described a simpler data stack as "the holy grail for agents" In a briefing with VentureBeat, he argued that as users code more applications, the agents that reason analytically about those applications need the underlying infrastructure out of the way to move quickly.

"Agents actually prefer a much simpler stack, because they can move much faster." said.

LTAP is committed to the unification of the storage layer where HTAP attempted the convergence of engines

Many vendors have tried various approaches over decades to unify analytical and transactional data.

In 2014, analyst firm Gartner coined the term HTAP, an acronym for Hybrid Transactional/Analytical Processing, as a way to describe vendors that attempted to unify the two types of databases. Providers, including MemSQL (now known as Unique store) SAP HANA and Oracle MySQL heat wave are among the many HTAP providers on the market.

LTAP is Databricks’ answer to HTAP, which uses the Lakebase architecture to unify data at the storage layer instead of the engine level. lake base is Databricks’ cloud-based serverless PostgreSQL database service that became generally available in February.

"For us, HTAP is more of an industry failure than a success." Xin said.

The LTAP approach goes to the storage layer instead of the query layer. Lakebase previously stored Postgres data in Postgres format in object storage, which required a conversion before Lakehouse’s analytics engines could use it efficiently. With LTAP, transactional data arrives directly in Delta or Iceberg format, sharing the same copy that analytical workloads read. Postgres is still the transactional engine. Spark and Lakehouse remain the analytics engine.

"The point is, hey, you use the best tool for the job at the query engine level, we just make sure the underlying storage is a single copy of the data." Xin said.

The central engineering challenge is latency. Object storage has response times in the seconds range, too slow for OLTP workloads that require sub-millisecond performance. Lakebase handles this through a caching layer between the Postgres compute instances and the object storage. The key design decision is where the column conversion occurs: the idle CPU capacity in that caching layer performs the row-to-column conversion before the data reaches object storage.

"When you convert data from row to column, it’s typically compressed more than 10x, so now you substantially reduce the networking cost of that basic caching layer between that caching layer and the object stores." Xin said.

Lakehouse//RT delivers millisecond query latency on live Lakehouse data without a separate service tier

Lakehouse//RT is Databricks’ answer to the dedicated real-time service tier: the separate system that enterprises have maintained alongside their lakehouses to handle low-latency queries, at the cost of data copies, split governance, and pipeline complexity that agents can’t address. Key capabilities of Lakehouse//RT include:

Reyden calculation engine: Designed specifically for high-concurrency, low-latency service, Reyden queries Delta and Iceberg tables directly without moving data out of the lake house.

Latency and performance: Lakehouse//RT delivers sub-100ms latency at 12,000 queries per second, with response times as low as 10ms on smaller data sets and up to 16x better performance than existing dedicated serving stacks.

Governance and data access: Each query runs within the Unity Catalog governance framework without a separate permissions layer, data copies, or ingestion pipelines.

Analysts see the agentic framing and open-format approach as the real differentiators.

The problem both products address is well documented among enterprise data teams, but analysts make a distinction between the weakness and the specific claim Databricks makes.

"Enterprises have had HTAP, streaming, cloud warehouses and operational stores for years," Stephanie Walter, AI Stack practice lead at HyperFRAME Research, told VentureBeat. "What is different is the agentic framing of AI."

Walter noted that agents need live operational data, historical context, governance, recovery and writeback in the same workflow.

"That’s a strong architectural argument, but Lakebase has yet to demonstrate that it can deliver on the latency, reliability, and operational maturity that CIOs expect." she said.

Mike Leone, an analyst at Moor Insights and Strategy, said the path to genuine differentiation is more specific than the concept of unification itself. He also noted that open analytics on a data lake is something that is in play now, and that many vendors provide some type of service.

"The less common measure is to allow transactional writes to also reach open formats, so that the operational database is not in a proprietary box while only the analysis half is open. "Leona told VentureBeat.

He added that the open-format approach, along with Lakehouse//RT querying live data directly from the lake, is what gives the architecture a credible case for retiring an entire row of specialized systems.

The technical claim that will face the most scrutiny is also the most central. "The part I would still like your engineers to look at is how both engines actually share a copy without a silent conversion step doing the synchronization in between." Leona said.

What this means for businesses

For data engineers evaluating their stack for agent workloads, the question is no longer which tool is best to run for each job, but whether it is still defensible to run separate tools.

Companies that created separate operational databases, real-time service levels, and analytics lakes could previously treat gaps between them as a maintenance burden. Agents bring those gaps to light as an operational risk: a system that reasons across governance boundaries will find inconsistencies faster than any human team.

The market is moving away from layers of specialized services faster than most provider roadmaps anticipated. According VB Pulse Q1 2026In a three-round longitudinal survey of more than 100 employee organizations, hybrid recovery intent tripled from 10.3% to 33.3% during the quarter, while standalone vector database adoption declined across all vendors tracked. The same consolidation logic is now affecting the real-time service level.

The traditional approach (best tools for each type of workload, pipelines between them) was built for human-speed analytics consumption. Agent workloads do not tolerate that architecture.

"The pain they point out, all the copying and synchronization between operating and analytical systems, is real and expensive, and anyone running this at scale feels it." Leona said.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *