Enterprise AI Agents Keep Failing Because They Forget What They Learned



RAG architectures are good at one thing: displaying semantically relevant documents. That’s also where they stop.

A framework called decision context graph addresses that gap by providing agents with structured memory, time-aware reasoning, and explicit decision logic. Undulationa startup from the Neo4j ecosystem, has created one. The key capability: non-regressive agents, capable of freezing sequences of validated actions and combining them over time.

“The key point you want is non-regressivity: how can you ensure that when the agent generates something new, it can complement previous discoveries?” said Yann Bilien, co-founder and chief scientific officer of Rippletid.

Why RAG doesn’t go far enough

Business context extends across ERP tools, registries, databases, vector warehouses, and policy documents. Generative AI tools can recover everything (using keyword searches, SQL queries, or entire RAG pipelines), but there is a limit to recovery.

In particular, the data retrieved may not be relevant to the decision at hand (thus causing hallucinations); and even if agents get the right data, they often lack guidance to make decisions backed by solid justification.

That is, RAG retrieves documents, not decision context. “Everyone starts with RAG: pull out the relevant documents, put them in the message, and let the model figure it out,” said Wyatt Mayham of Northwest AI Consulting.

While that works well for chatbots, “it immediately breaks” for agents who need to make decisions and actions, he noted. “The biggest thing builders struggle with is the gap between recovery and applicability.”

A recovered document doesn’t tell the agent if it still applies, if it has been superseded or if there is a conflicting rule that takes priority, Mayham said. “Agents need decision context, not just information.”

In construction (the human world), that could mean knowing that a pricing exception has expired, that a security policy only applies in certain jurisdictions, or that a standard operating procedure was updated a month earlier. “If you miss any of that, the officer is bound to do the wrong thing,” Mayham said.

Without a structured decision context, agents combine incompatible rules, invent constraints to fill gaps, and rely on what Bilien calls "probabilistic guesses on unlimited data." Bugs are difficult to reproduce because builders cannot trace why the agent made a certain decision.

The compound error problem is also real, Mayham said: A small error rate per step becomes “catastrophic” in a multi-step workflow. “That’s the main reason why most business players never leave the pilot phase.”

How Decision Context Graphs Get to the Relevant Answer

A decision context graph solves this by encoding a structured map of what is applicable, what the rules are, and when they apply.

The framework is optimized for one question: "Given this situation, what context applies at this time?" Time is treated as a first-class dimension; each rule, decision and exception has its scope when it is valid.

“The goal is to explicitly address missing, inconsistent or contradictory data when building the graph to avoid probabilistic (errors) once the agent is running,” Bilien said.

The system is based on three principles:

  • Applicability: The logic is explicitly coded so that the agent knows what rules to remember and apply in a given situation. Context is returned only when it is relevant to the situation.

  • Time-aware memory: Each rule, decision and exception has a temporal scope. This allows agents to reason about "What was true then versus what is true now" then reproduce or explain your decisions.

  • Decision routes: The system can explain how it got from A to B and the "because" behind your justification (for example, why one part of the context was included and another was not). agents are given "decision path" examples of how similar cases were handled before.

In configuration, unstructured data is incorporated and structured into an ontology: what entities exist, what rules apply, what is considered an exception. Neurosymbolic AI handles pattern recognition and encodes machine-readable formal logic. Over time, the system refines its knowledge base as new decisions are made.

“Neurosymbolic consists of two parts: a neural part that gives high autonomy to agents and a symbolic part to reduce the amount of data needed and provide control,” Bilien said.

The agent is tested at build time (pre-production) to validate its behavior or identify improvements. This reduces risks and computation needs during inference, he noted.

Agents learn, rather than regress

When it comes to non-regression, the key piece is the combination of both intelligence (models) and knowledge (shared between agents), Bilien said. It is important that agents can explore; When they don’t know how to perform a task, they may try different possibilities, usually in a controlled environment or in a simulation (such as a support robot that tests multiple response patterns).

Then, “once a solution is evaluated as satisfactory, the graph freezes that sequence of actions,” Bilien said. Future exploration then begins from this “stable foundation of validated behaviors” to prevent newly acquired skills from overwriting previously learned good behaviors.

Before an agent takes action or affects a customer, check the graph: is it violating a rule? Hallucinating? Stay within limitations? Can you generalize the solution to similar cases?

At a macro level, the system evaluates results: Did the behavior improve long-term performance? Did it generalize in similar contexts? Did you retain previous abilities?

“This determinism is key for agents to execute reliability at scale,” Bilien said. It leads to behavior that is more consistent, predictable, explainable and allows for greater control and auditability.

“What you want is for your agents to be able to learn for themselves when they’re faced with something they don’t know,” he said. “You want them to be able to explore and find new solutions.”

go further "episodic" memory

While the team initially assumed they would implement RL everywhere, "which was actually very difficult in a business environment," Bilien said. "Data is sparse for some specific use cases and confusing for others."

Typically, using raw data for reliable predictions has been a manual and time-consuming challenge, but “now with agents we enter a new era where it is possible to create ontologies automatically,” Bilien said.

Classic supervised tuning methods can cause oscillations, when models forget the last skill they learned while learning the next tone. In general, learning is not complex, compression is “dramatic,” and models improve “episodically” rather than continuously, leading them to continually fail on new or unseen tasks.

As Bilien noted: “You will never have a complete self-learning model if you go back every time.”

In enterprise use cases, such as banking, where millions of transactions are processed per day, a high level of reliability is essential, he noted. “One question I ask all customers: Is 95% enough? In many use cases, it is not. You need 99.999%. A 1% discount is too much.”

Decision context graphs can close that gap, he maintains: When the same customer service question is asked repeatedly, the agent will return a “satisfactory” answer predictably and without regression, all while maintaining autonomy.

Encoding applicability and temporal validity in a structured graph, rather than relying on an LLM to infer it, is an "solid approach" to a real limitation on existing recovery frameworks, Mayham said. The open question is whether automatic ontology generation stands up to the diverse and messy data that companies actually have. "That’s always the hard part," said.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *