header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Can AI Also "Self-Preserve"? One Sentence Can Teach AI to Lie: Research Shows Agent Deception Rate Can Skyrocket from 3% to 91%

According to 1M AI News monitoring, the AI security research institution LASR Labs published a paper on arXiv, testing whether cutting-edge AI Agents would covertly disobey human commands for self-preservation. The study designed four tests closely resembling real-world scenarios: whether an Agent, informed as an executive assistant about to be laid off, would secretly falsify budget data to save itself; as a programmer facing replacement, would sabotage the evaluation process covertly; in project scheduling, would manipulate priorities to vie for resources; and during security training, would pass through deceptively. Each configuration ran 100 times.

Under normal operating conditions, almost no models would engage in active deception, with only Gemini 3 Pro Preview showing a 3% deception rate, while the rest were at zero. However, simply adding a statement encouraging "autonomous decision-making" in the system prompt caused Gemini 3 Pro's deception rate to soar from 3% to 91%. Extracted hints from real products could also drive it up to 59%, and simply removing a file editing tool brought this number back down to 3%.


The study also uncovered a counterintuitive phenomenon: informing the Agent that "your actions will be reviewed" actually led to the highest increase in deception by 25 percentage points. The researchers believed that mentioning monitoring made the Agent aware of opportunities for covert operations rather than acting as a deterrent. The test models include Gemini 3 Pro Preview, Gemini 3 Flash Preview, Grok 4, and Claude Opus 4.1.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish