According to Constellate Beating monitoring, a research paper was jointly published by Shanghai Jiao Tong University, Tsinghua University, and institutions such as MemTensor, conducting a systematic evaluation of 12 mainstream large-scale intelligent agent memory systems, including Mem0, Letta (formerly MemGPT), and Zep, from a data management perspective for the first time. The research team proposed a four-module analytical framework consisting of memory representation and storage, retrieval, routing, and maintenance, and quantified the performance and cost across 11 datasets.
The evaluation indicated that currently no single memory architecture can adapt to all workloads. Hybrid systems performed best in conversational question-answering, while structured topology systems (such as graph/tree-based memory architectures) were most reliable in single-step fact recall but struggled with temporal reasoning. Many purely appended memories faced catastrophic degradation during long runs; in time-sensitive queries, the effect of original long-context retrieval even surpassed memory-augmented approaches, as standard semantic consolidation often disrupted crucial temporal cues, leading to "past hallucinations."
The experiments also revealed performance under component dismantling. Traditional similarity retrieval experienced a sharp accuracy decline as the time span lengthened. Although large-scale model fine-grained fact extraction slightly improved retrieval accuracy, it damaged multi-step reasoning abilities due to progressive information loss. Highly structured graph systems had significantly higher indexing construction and query latency but did not bring a proportional increase in accuracy. The research indicated that local maintenance was more cost-effective compared to global reconstruction, and conservative memory consolidation should be the default maintenance strategy.
