According to Dynamic Beating monitoring, Google Research Institute has released the ReasoningBank intelligent agent memory framework, allowing large-model-driven intelligent agents to continue learning after deployment. The core approach is to distill the success and failure experiences of past tasks into a universal reasoning strategy stored in a memory bank, so that when faced with a similar task in the future, the agent can retrieve and execute based on this stored knowledge. The related paper was published at ICLR, and the code has been open-sourced on GitHub.
Prior to this, two mainstream solutions each had their own shortcomings: Synapse recorded complete action trajectories at a too granular level for easy transferability, while Agent Workflow Memory only extracted workflows from successful cases. ReasoningBank made two key changes: it transformed the stored objects from "action sequences" to "reasoning patterns," with each memory containing structured fields of title, description, and content; and it also incorporated failed trajectories into the learning process. The model calls another large model to self-assess its execution trajectory, breaking down failed experiences into anti-pitfall rules, such as upgrading from "clicking the Load More button when seen" to "first verify the current page indicator to avoid infinite scrolling, then click to load more."
The paper also introduced Memory-aware Test-time Scaling (MaTTS), which involves investing more computational power for repeated attempts during reasoning and storing the exploration process in the memory bank. Parallel expansion allows the intelligent agent to run multiple different trajectories for the same task, refining a more robust strategy through self-comparison; sequential expansion refines within a single trajectory repeatedly, incorporating intermediate reasoning into the memory bank.
On two benchmarks, the WebArena browser task and the SWE-Bench-Verified code task, using the Gemini 2.5 Flash with the ReAct intelligent agent, ReasoningBank achieved an 8.3% higher success rate on WebArena compared to the no-memory baseline and a 4.6% higher success rate on SWE-Bench-Verified, with an average of about 3 fewer steps per task. Adding MaTTS parallel expansion (k=5), the success rate on WebArena further increased by 3 percentage points, with a reduction of 0.4 steps on average.
