The modern data stack was created for humans to ask questions. Google just rebuilt it for agents to take action.



Enterprise data stacks were built for humans to run scripted queries. As AI agents increasingly act autonomously on behalf of businesses 24 hours a day, that architecture is breaking down and vendors are rushing to rebuild it. Google’s answer, announced at Cloud Next on Wednesday, is Agentic Data Cloud.

The architecture has three pillars:

  • Catalog of knowledge. It automates the curation of semantic metadata, inferring business logic from query logs without manual intervention from the data manager.

  • Lake house between clouds. Allows BigQuery to query Iceberg tables in AWS S3 over a private network without egress fees

  • Data Agent Kit. Includes MCP tools in VS Code, Claude Code, and Gemini CLI for data engineers to describe results instead of writing pipelines.

"Data architecture has to change now," Andi Gutmans, vice president and general manager of Data Cloud at Google Cloud, told VentureBeat. "We are moving from the human scale to the agent scale."

From intelligence system to action system

The core premise behind Agentic Data Cloud is that enterprises are moving from human-scale operations to agent-scale operations.

Historically, data platforms have been optimized to generate reports, dashboards, and some forecasts, which Google characterizes as “reactive intelligence.” In that model, humans interpret the data and decide what to do.

Now that AI agents are increasingly expected to take actions directly on behalf of the company, Gutmans argued that data platforms must evolve into action systems.

"We need to ensure that all enterprise data can be enabled with AI, that includes structured and unstructured data." Gutmans said. "We need to ensure that there is the right level of trust, which also means that it is not just about gaining access to the data, but really understanding it."

Knowledge Catalog is Google’s answer to that problem. It’s an evolution of Dataplex, Google’s existing data governance product, with a materially different architecture underneath. While traditional data catalogs required data administrators to manually label tables, define business terms, and create glossaries, Knowledge Catalog automates that process using agents.

The practical implication for data engineering teams is that the Knowledge Catalog scales across the entire data set, not just the selected subset that a small team of data managers can manually maintain. The catalog covers BigQuery, Spanner, AlloyDB, and Cloud SQL natively, and is federated with third-party catalogs including Collibra, Atlan, and Datahub. Zero-copy federation extends the semantic context of SaaS applications, including SAP, Salesforce Data360, ServiceNow, and Workday, without requiring data movement.

Google’s lake house crosses the cloud

Google has had a data lake called big lake from 2022. It was initially limited to just Google data, but in recent years it has had some limited federation capabilities that allow companies to query data found in other locations.

Gutmans explained that the previous federation worked through query APIs, which limited the features and optimizations BigQuery could apply to external data. The new approach is to share storage through the open Apache Iceberg format. That means whether the data is in Amazon S3 or Google Cloud, he argued, makes no difference.

"This really means that we can bring all the goodness and all the capabilities of AI to those third-party data sets." said.

The practical result is that BigQuery can query Iceberg tables located in Amazon S3 through Google’s Cross-Cloud Interconnect, a dedicated private network layer, with no egress fees and with a price-performance ratio that Google says is comparable to native AWS warehouses. All BigQuery AI functions run on that cross-cloud data without modification. Two-way federation in preview is extended to Databricks Unity Catalog on S3, Snowflake Polaris, and AWS Glue Data Catalog using the Iceberg REST Catalog open standard.

From writing pipelines to describing results

The knowledge catalog and intercloud lake house solve the problems of context and data access. The third pillar addresses what happens when a data engineer sits down to build something with all of them.

The Data Broker Kit ships as a portable set of skills, MCP tools, and IDE extensions that are included in VS Code, Claude Code, Gemini CLI, and Codex. It does not introduce a new interface.

The architectural change it allows is a step in what Gutmans called a "prescriptive co-pilot experience" to engineering driven by intent. Instead of writing a Spark pipeline to move data from source A to destination B, a data engineer describes the result (a clean dataset ready for model training, a transformation that applies a governance rule), and the agent selects whether to use BigQuery, Lightning Engine for Apache Spark, or Spanner to run it, and then generates production-ready code.

"Customers are a little fed up with building their own pipelines," Gutmans said. "They’re really more in review mode than writing code mode."

Where Google and its rivals diverge

The premise that agents require semantic context, not just access to data, is shared across the market.

Databricks has Unity Catalog, which provides governance and a semantic layer throughout your lake house. Snowflake has Cortex, its semantic layer and artificial intelligence offering. Microsoft Fabric includes a semantic model layer built for business intelligence and, increasingly, agent grounding.

The dispute is not about whether semantics matter: everyone agrees that it does matter. The dispute is over who builds and maintains them.

"Our goal is simply to obtain all possible semantics," he explained, noting that Google will federate with third-party semantic models rather than requiring customers to start over.

Google is also positioning openness as a differentiator, with two-way federation across Databricks Unity Catalog and Snowflake Polaris through the open Iceberg REST Catalog standard.

What this means for companies

Google’s argument (and one that is repeated throughout the data infrastructure market) is that companies are behind on three fronts:

Semantic context is becoming infrastructure. If your data catalog is still curated manually, it won’t scale to agent workloads, and Gutmans maintains that the gap will only widen as agent query volumes increase.

Inter-cloud egress costs are a hidden tax on agent AI. Storage-based federation via open standards Iceberg is emerging as the architectural answer at Google, Databricks and Snowflake. Companies locked into proprietary federation approaches should test those costs on agent-scale query volumes.

Gutmans argues that the era of pipeline writing is coming to an end. Data engineers who now move toward outcomes-based orchestration will have a significant advantage.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *