A proof of concept forgives a fragile data path. Operational AI does not.

Presented by F5

When companies move AI workloads from pilot to production, data delivery often becomes the factor that determines whether those systems can reliably scale. Peer-to-peer architectures that connect storage directly to compute hold up under demonstration conditions, but often fail under sustained, concurrent production traffic. The result is stalled inference processes, lagging RAG systems, underutilized GPUs, and SLA violations, all of which have direct business consequences.

"Organizations successfully operationalize AI when their infrastructure is designed to handle real-world failures, not just controlled conditions." says Hunter Smit, senior director of product marketing at F5.

Production traffic exposes architectural weaknesses

In a pilot, a stalled transfer is an inconvenience, while in production, that same stall is an interruption that now belongs to someone. The underlying architecture is typically identical in both cases: When a client is directly connected to storage, the system becomes increasingly fragile under sustained, concurrent production traffic because that direct connection has no response when a node fails or traffic spikes. From there, retries and timeouts cascade, and the entire process is backed up right when the business depends on the outcome.

"Peer-to-peer architectures, where the S3 client connects directly to S3 storage, are not resilient." says Paul Pindell, Principal Solutions Architect for Technology Alliances at F5. "If a single storage node fails, all traffic to that cluster is degraded, and in some cases the cluster may fail entirely."

The problem is that AI workflows, including RAG-based inference and agent AI, increasingly treat S3 storage as a first-class citizen in the AI cluster. However, the network connectivity between that storage and the cluster was never designed for the high-performance, uninterrupted data movement needed to keep GPUs running optimally.

The real cost of stagnant pipelines and underutilized GPUs

"Business leaders tend to frame AI infrastructure around GPU utilization, but what differentiates AI from traditional deterministic workloads is that the infrastructure continually influences those outcomes with every interaction." says Tanu Mutreja, Senior Director of Product Management at F5. "In AI environments, infrastructure is no longer just a back-end concern. Shape customer experience, quality, resilience and cost with every transaction."

There may be significant business consequences. For example, when inference processes stop, it becomes an SLA and customer experience issue. When RAG systems lag, models lose access to timely and relevant context, resulting in inaccurate, outdated, or misleading responses, all of which create operational, compliance, and reputational risks. At the same time, the infrastructure issues that create those problems can also increase costs by leaving expensive GPU resources idle or underutilized.

"When GPUs are underutilized, it indicates inefficiencies in the infrastructure that inflate costs and limit scalability and responsiveness." Mutreja says. "The leading question is whether end-to-end AI infrastructure consistently delivers reliable, secure, high-quality, governed AI experiences with sustainable unit economics."

Create a production-ready data delivery layer

F5 treats data delivery as a first-class infrastructure layer rather than assuming that the network path will simply work. While application delivery optimized the flow of requests between users and applications, data delivery optimizes the flow of data between storage, networking, and computing, including AI computing.

Making data delivery a first-class layer means building three properties into it:

Observability provides real-time visibility into latency, throughput, and flow status.

Programmability enables policy-based control over how data moves, through dynamic routing, traffic optimization, rate management, and automated failover.

Failure awareness builds resiliency for degraded networks, storage throttling, and service interruptions.

In it architecture that F5 has developed for Dell ObjectScaleF5 BIG-IP sits between ObjectScale and AI computing as a programmable control point at the storage edge.

"We have seen cases where a misconfiguration in the AI computing layer effectively caused a DDoS attack on the S3 storage infrastructure, " Pindell says. "Not in a malicious way, but more of an ‘Oh no, what did I do?’ moment, but it still required storage for the entire organization."

Placing BIG-IP as an application delivery controller between the storage and compute layers protects the storage with QoS, rate limits, and connection limits, keeping it resilient and operational under that type of load. Tests validated by SecureIQLab It confirmed that this protection does not come at the cost of performance, which is important architecturally, Pindell says.

"Preserving and even improving performance is essential," explains. "It’s what allows you to incorporate higher-level functionality, resiliency, and enhanced security, without sacrificing performance to do so."

The added complexity of hybrid and multi-cloud AI

AI deployments in hybrid multi-cloud environments face an even greater data delivery challenge due to the heterogeneity involved. In other words, data traversing these environments must contend with inconsistent policies, security controls, identity systems, governance requirements, fragmented visibility, and different failure boundaries.

Programmable traffic management and observability address this complexity together. Observability provides a unified view of application, network, and infrastructure health in otherwise disconnected environments. Programmable traffic management uses that knowledge to intelligently route, balance and switch traffic in real time. Together, they create a closed-loop feedback system that applies consistent policies, improves resiliency across all fault domains, and ensures reliable high performance. AI data delivery regardless of where applications, data or users reside.

What separates production AI from perpetual pilots?

Organizations that move beyond perpetual pilots share a specific engineering discipline, Smit says.

"They are those who seek a production design with failure as a normal state, not an exception." explains. "They will assume that latency, congestion, and partial outages will occur. And they build a data path observable and aware enough of failures to absorb them, with explicit mitigation for each degraded condition rather than a hope that the network will hold up."

Organizations stuck in perpetual pilots are still optimizing for perfect lab results and discovering the gap in the real world only when a workload goes live. The problem is not the quality of the model or the number of GPUs, but whether the data delivery layer was designed with the same rigor as the computation.

"Teams need to understand that a real-world network behaves very differently than an optimized lab network." Pindell says. "They need a mitigation plan for the failure states and performance bottlenecks they will encounter in production."

Sponsored articles are content produced by a company that pays to publish or has a business relationship with VentureBeat, and are always clearly marked. For more information, contact sales@venturebeat.com.

Source link

A proof of concept forgives a fragile data path. Operational AI does not.

Production traffic exposes architectural weaknesses

The real cost of stagnant pipelines and underutilized GPUs

Create a production-ready data delivery layer

The added complexity of hybrid and multi-cloud AI

What separates production AI from perpetual pilots?

Leave a ReplyCancel Reply

Google just ruined my favorite Pixel feature and I’m furious

Samsung T9 Portable SSD Just Dropped to Its Lowest Price Since January – Fix Your Storage Woes for Just $0.18 Per GB