
As enterprise AI agents move into production, organizations face an increasing reliability problem. Many teams are finding that LLM performance alone does not determine whether agents are successful in production. Long-running AI workflows must survive failures, preserve state, recover from failures, manage inference costs, and coordinate between APIs, tools, and enterprise systems.
After a first wave focused on rapid deployment, organizations now need to revisit those first-generation deployments and redesign early agent architectures around workflow orchestration, observability, governance and recovery, said Preeti Somal, senior vice president of engineering at Temporal Technologies, during the latest AI Impact Series event in New York.
“We have a lot of customers coming to us to build version 2.0 of the same agent,” Somal said. “They had to act very quickly, but they didn’t take care of the pipes. Things crash and burn, and then they have to rebuild again with a reliable foundation.”
For workflow orchestration company Temporal, whose infrastructure predates the current wave of agent AI, the change reflects a broader business realization: Production AI systems require durable execution, state management, visibility into workflows, and mechanisms to recover when downstream models or systems fail.
Agent AI Has Powered Known Engineering Problems
“These patterns are not necessarily new," Somal said. " The AI simply powers them."
Agent systems introduce additional complexity because they often involve long-running, multi-step processes that span multiple services, models, APIs, and tools. A single workflow can call multiple large language models, access recovery systems, activate external applications, and manage state for hours or days. Engineering issues, Somal said, often arise only after deployment.
“People write to agents but haven’t thought about what happens if the agent fails,” he said. “Will I have to rerun the entire agent flow?”
For companies operating under cost constraints, the answer is important. Restarting workflows after failures can multiply inference overhead, increase latency, and create poor customer experiences.
Somal compared the current moment to an earlier period in enterprise cloud adoption, when organizations moved directly to migrating workloads before considering they needed to redesign the underlying architectures if they wanted these workloads to endure over the long term.
“This rush to do AI in a world where applications haven’t even been modernized reminds me a little bit of that upheaval and shift that happened in the cloud,” he said. “Everyone realized that more money is being spent on the cloud and we haven’t gotten value there.”
Why long-lived agents force a new architecture
Enterprise workflows increasingly involve agents running over long periods, sometimes spanning many hours while interacting with tools and systems. Reliability challenges are compounded when workflows persist over time and affect both state and memory, two ideas that are often treated interchangeably in conversations about AI.
The status refers to the execution of the workflow. It includes where an agent is in a process, what actions have already completed, and where recovery should resume after a failure. Memory or context captures information that an agent transmits through interactions or tasks.
“The state of the agent depends on what step and what actions have been taken, and if something fails, where it wants to recover from, versus the context and the memory piece,” Somal explained.
That distinction becomes increasingly important as companies begin to move beyond simple chatbot interactions toward longer-lived business processes. Somal pointed to a healthcare example involving client Abridge, where workflows process doctor visits through multiple stages, including audio processing, summarization, model calls, and post-visit generation.
“There is no one part to that flow,” Somal said. “Taking videos and breaking them down, taking summaries, calling the LLMs, generating the post-visit summary, all of that is being orchestrated.”
The implication for businesses is that successful agents increasingly rely on systems that can survive disruptions, coordinate all services, and maintain continuity over time.
The rise of the deterministic backbone
A useful framework for designing enterprise AI is the deterministic backbone, Somal said, which is how they think about Temporal’s role.
“It is denoting the path you want to take," she said. "It is calling the brain, but if the brain does not respond, it will call it again. If the brain responds but the next step is going to fail, it will continue from where that failure occurred.”
In this framework, the language model acts as a probabilistic system that produces variable results, while the orchestration software maintains the reliability of execution around it. And the concept matters because enterprise systems increasingly require consistency even when models remain nondeterministic. A procurement workflow, healthcare summary, customer service escalation, or fulfillment process cannot simply fail silently because a model call timed out or because an external dependency failed.
“What matters most to him is making sure he can recover and not pay the token tax if something goes wrong,” Somal said.
Reliability, visibility and economy of symbolic spending
As business leaders evaluate the ROI of AI, cost visibility has become a growing concern. Long-time agents often make multiple model calls in complex workflows, which can create opaque spending patterns. Somal described an operational advantage of orchestration as visibility into where costs accumulate. Because workflows can be observed step by step, teams can see where tokens are consumed in an agent process.
“You have visibility of all that flow in a single pane of glass,” he said. “Now you can see where you are spending the tokens on an agent that has multiple steps and calls multiple different systems.”
Workflow recovery also influences profitability. Without durable orchestration, a late-stage failure can force organizations to rerun an entire process from the beginning, including all previous model calls. Somal said systems designed around recovery can resume execution from the point of interruption.
“It picks up from where the accident occurred,” he said. “We save you the cost of rerunning the agent from step one.”
Companies need to build paved roads and rely on the experience of their partners
Concerns about governance are another emerging pattern as agent AI takes hold. Instead of adopting fully managed agent systems wholesale, Somal said companies increasingly want standardized internal frameworks that provide guardrails while preserving flexibility and implementing necessary features such as governance controls, model selection policies, identity systems, cost management and observability.
“Companies are studying the possibility of building these paved roads,” he said. “Taking something off the shelf may not work because there are all these other requirements.”
As organizations review first-generation deployments, challenges like this are looking less like a model problem and more like a systems engineering problem, and Temporal is positioned to help companies take this next step, in part because for many organizations, it already existed as part of broader modernization programs before AI became a strategic priority.
“Temporary is already in the company,” Somal said. “Taking that and extending it to AI and agent platforms seems very natural.”





