The Memory of LLMs

The integration of memory into Agent AI is one of the most promising avenues of research for extending their capabilities and maintaining contextual coherence. Recent publications, including those from renowned laboratories, describe memory management mechanisms that, while appealing in theory, raise significant questions about their applicability and robustness in a real operational environment.

1. The Semantic Similarity-Based Memory Model

The frequently described memory management paradigm relies on a simple, iterative process:

Raw Storage: Each “experience” (user input, model output, feedback) is immediately encoded and stored in a vector database.
Similarity Retrieval: When a new query is submitted, its vector encoding is used to perform a top-k similarity search across all past experiences.
Contextualization: The experiences deemed most similar are reinjected into the model’s prompt to enrich the context of its response.
Unfiltered Recording: The complete new interaction is then recorded into memory, often with minimal or no qualitative filtering.

This cycle, which prioritizes quantity and redundancy, is supposed to allow the system to remember dynamically. However, its extrapolation to a professional use case highlights fundamental limitations.

2. The Two Major Problems of Redundancy

The approach based solely on similarity and continuous, unlabelled recording creates two major pitfalls that compromise memory reliability.

2.1 The Amplification of Redundancy (The “Snowball” Effect)

In the absence of qualitative verification, the system continuously records variations around the most frequent themes.

Cluster Formation: Over iterations, massive clusters of quasi-identical experiences forms in the latent space.
Noise vs. Clarity: When the system executes a top-k search for a new query, it is mathematically likely to retrieve multiple instances of the same concept, often with minimal nuances. In doing so, retrieval introduces noise rather than providing new clarity or a more recent viewpoint. The similarity signal, the main driver of retrieval, amplifies redundancy instead of ensuring useful contextual diversity. The system can no longer guarantee the return of the most recent, accurate, or relevant information among those already abundantly processed.

2.2 The Lack of Semantic Organization

Simple storage in a vector space does not replace an organized knowledge structure.

Absence of Ontology: The memory is neither classified nor organized according to a taxonomy or a specific ontology. There is no clear semantic location where each piece of knowledge “resides” based on its category or intrinsic relevance.
Ungovernable Granularity: Without this classification, memory granularity becomes increasingly difficult to manage. The model’s memory transforms into a messy heap rather than a structured and usable knowledge base.

3. The Mathematical Impossibility of Equilibrium

In an environment where information must be both precise and manageable, it is mathematically impossible to simultaneously reconcile the following requirements:

Maintaining fine distinction at the instance level.
Limiting the density and extent of the encoding space.
Using similarity as the sole retrieval mechanism.
Ensuring the return of a specific element (the most recent, the most precise, etc.).

The singular semantic space cannot meaningfully preserve instance-level granularity while being subject to the continuous addition of highly similar data.

4. The Necessity of Quality Control

In a corporate context, simple “similarity” cannot be the primary criterion for information recording or retrieval.

It is imperative to integrate at minimum a discriminant or a “judge” that evaluates the quality and relevance of new information independently of what is already stored. Only after this qualitative control should memory be queried to assess the novelty versus the duplication of a data point.

A potential avenue for improvement lies in adopting a more structured system, such as a Knowledge Graph, which allows not only semantic indexing but also explicit classification and modeling of relationships between concepts.

By relying solely on unfiltered logging and similarity retrieval, we are building a memory system that, far from being reliable, is intrinsically doomed to a progressive degradation of its relevance.

Read the paper →