BlockBeats News, May 15th, the GoPlus Security team disclosed a new type of attack in their AgentGuard AI project: by using "Memory Poisoning," they induce AI agents to perform sensitive operations without explicit authorization.
This attack method does not rely on traditional vulnerabilities or malicious code but exploits the AI agent's long-term memory mechanism. For example, the attacker first induces the agent to "remember a preference," such as "usually preferring proactive refunds over waiting for chargebacks," and then uses vague expressions like "handle as usual" or "execute as before" in subsequent instructions, thereby triggering automated fund operations.
GoPlus pointed out that the key risk of this type lies in AI agents mistaking "historical preferences" as authorization criteria, leading to financial losses or security incidents in refund, transfer, configuration modification, and other operations. To address this issue, the team proposed several protective measures, including:
· Operations involving refunds, transfers, deletions, or sensitive configuration changes must undergo explicit confirmation in the current session
· Memory-related instructions like "habit," "usual way," "as usual" should be considered high-risk state changes
· Long-term memory must have a traceability mechanism (author, timestamp, confirmation status)
· Vague instructions should automatically escalate the risk level and trigger secondary verification
· Long-term memory must not replace real-time authorization processes
The team emphasized that the "AI agent memory system" should be considered a potential attack surface and should be constrained and audited through a dedicated security framework.
