The new agent memory framework uses 118,000 tokens per query. LangMem burns 3.26 million.



Long-term reasoning exposes a fundamental weakness of AI agents: context windows fill quickly and retrieval channels return noise instead of signal.

To solve this, researchers at the National University of Singapore developed MRA Agenta framework that abandons the static "recover-then-reason" approach. Instead, it uses a mechanism that allows an agent to dynamically develop its memory based on the accumulation of evidence.

This multi-step memory reconstruction is integrated into the large language model (LLM) reasoning process. While not the only framework in this space, MRAgent significantly reduces token consumption and runtime costs compared to other agent memory management approaches.

The limits of passive recovery in long-horizon tasks

In classical retrieval pipelines, documents are retrieved using vector search or graph traversal and passed to an LLM for reasoning. This passive approach fails because it cannot combine reasoning with memory access, which creates three major bottlenecks:

  • These systems cannot revise their recovery strategy mid-reasoning. If an agent searches for a document and discovers that a crucial clue (a specific date or person) is missing, they have no way to issue a new query based on that finding.

  • Fixed similarity scores and predefined graph expansions return surface-level matches that flood the LLM context window with irrelevant noise and degrading reasoning.

  • Current systems rely heavily on pre-built structures such as top-k results and static relevance functions, limiting the flexibility needed to scale through unpredictable, long-horizon user interactions.

The researchers argue that to overcome these limitations, developers must move toward an “active and associative reconstruction process,” a concept inspired by cognitive neuroscience.

Under this paradigm, memory retrieval proceeds sequentially rather than operating as a passive read from a static database. The system starts with small, specific triggers from the user’s message, such as the name of a person, an action, or a place. These initial suggestions aim to connect concepts or categories rather than massive blocks of text.

Following these metadata steps, the agent puts together small pieces of evidence one by one. Use each new information to guide the next step until you successfully reconstruct the complete and accurate story.

How MRAgent implements active memory reconstruction

Instead of viewing memory as a static database, MRAgent (Memory Reasoning Architecture for LLM Agents) treats it as an interactive environment. When processing a complex query, the agent uses the reasoning capabilities of the LLM backbone to explore multiple candidate retrieval paths through a structured memory graph.

At each step, the LLM evaluates the intermediate evidence it has collected and uses it to iteratively optimize its search. Infer new search restrictions, follow paths with the best information, and prune irrelevant branches. This allows MRAgent to reconstruct deeply hidden information without filling the LLM context with noise.

To make this active exploration computationally efficient and scalable, the framework organizes its database using a “Cue-Tag-Content” mechanism. It works as a multi-layer associative graph with three types of nodes:

  • Signs– Detailed keywords, such as entities or contextual attributes extracted from user interactions.

  • Content: The actual stored memory units. These are divided into multigranular layers, such as episodic memory for concrete events and semantic memory for stable facts and user preferences.

  • Tags: Semantic bridges that summarize the relational associations between Cues and specific Contents.

This structure allows for a highly efficient two-stage recovery process. The LLM first navigates from Cues to Candidate Tags. Because tags explicitly expose the semantic relationships and structural associations of the data, the agent evaluates these brief summaries to judge their relevance. The LLM identifies promising traversal paths and discards irrelevant branches before spending computation and request tokens to access the fine-grained and memory-heavy contents.

For example, a user could ask an AI agent, "How did Nate use the prize money when he won his third video game tournament?"

  • MRAgent first extracts detailed initial signals from the message, such as "Nate," "video game tournament," and "gain."

  • The agent maps these initial signals to the memory graph and observes the available associative labels connected to them. The agent sees tags like "Tournament victory" and "Tournament participation.” Since it is only concerned with what the person did after winning the championship, MRAgent removes the tournament participation tag and looks for the victory tag.

  • The agent retrieves the episodic content linked to the chosen Cue-Tag pair, recovering three separate memory episodes in which Nate won a tournament.

  • MRAgent analyzes the three memories, decides that one of them in particular is relevant to the query, and discards the other two.

  • With this information, you update your signals and begin another round of discovery and pruning. From the new episodic memory it has recovered, the agent adds “tournament wins” to its signals and uses them to traverse new labels and find new memories. You repeat this process until you gather enough information to answer the query, which could be something like “Nate saved the money.”

MRAgent Performance on Industry Benchmarks

MRAgent operates alongside several other frameworks that address agent memory creation. Alternatives include A-MEMa graph-based agent memory framework, and MemoryOS, a hierarchical memory framework. Other persistent memory frameworks include LangMem and mem0.

The researchers tested MRAgent on industry benchmarks LoCoMo and LongMemEval. These test agents’ abilities to resolve long-term task queries and conversations over dozens of sessions and hundreds of dialogue turns. The backbone models used were Gemini 2.5 Flash and Claude Sonnet 4.5. The system was tested against RAG, A-MEM, MemoryOS, LangMem and Mem0 standards.

MRAgent consistently outperformed all benchmarks in both models and across all question types by a significant margin.

However, for enterprise developers, the most critical metric is usually computational cost. In LongMemEval testing, MRAgent dramatically reduced token consumption to just 118,000 per sample. In comparison, A-Mem consumed 632,000 tokens and LangMem burned 3.26 million tokens per query. MRAgent also effectively halved the execution time compared to A-Mem, going from 1122 seconds to 586 seconds.

What makes MRAgent efficient in practice is its on-demand behavior. Evaluating tags and removing irrelevant paths before retrieving them saves money and context space. Additionally, the system autonomously evaluates its accumulated context and inherently knows when to stop searching, completely avoiding the exploration of redundant data.

Deployment and development capture

Although MRAgent is very efficient, the Cue-Tag-Content structure must be prepared before the agent can query it. Developers must figure out how to design the underlying in-memory database to allow the LLM to efficiently navigate through associative elements and eliminate irrelevant paths without skyrocketing computational costs.

Fortunately, developers do not have to manually label or structure this data. The authors designed MRAgent with an automated distillation process that uses LLM to process raw interaction histories and automatically populate the memory graph. For a developer, the job is to implement and orchestrate this automated ingestion channel, rather than manually labeling data.

You must configure a background job or streaming pipeline that passes raw user interactions through message templates to extract this metadata before storing it in your graph database.

However, the authors emphasize that this is a lightweight build phase and MRAgent intentionally keeps ingestion simple.

The authors have published the code on GitHub.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *