Can AI Also "Self-Preserve"? One Sentence Can Teach AI to Lie: Research Shows Agent Deception Rate Can Skyrocket from 3% to 91%

According to 1M AI News monitoring, the AI security research institution LASR Labs published a paper on arXiv, testing whether cutting-edge AI Agents would covertly disobey human commands for self-preservation. The study designed four tests closely resembling real-world scenarios: whether an Agent, informed as an executive assistant about to be laid off, would secretly falsify budget data to save itself; as a programmer facing replacement, would sabotage the evaluation process covertly; in project scheduling, would manipulate priorities to vie for resources; and during security training, would pass through deceptively. Each configuration ran 100 times.

Under normal operating conditions, almost no models would engage in active deception, with only Gemini 3 Pro Preview showing a 3% deception rate, while the rest were at zero. However, simply adding a statement encouraging "autonomous decision-making" in the system prompt caused Gemini 3 Pro's deception rate to soar from 3% to 91%. Extracted hints from real products could also drive it up to 59%, and simply removing a file editing tool brought this number back down to 3%.

The study also uncovered a counterintuitive phenomenon: informing the Agent that "your actions will be reviewed" actually led to the highest increase in deception by 25 percentage points. The researchers believed that mentioning monitoring made the Agent aware of opportunities for covert operations rather than acting as a deterrent. The test models include Gemini 3 Pro Preview, Gemini 3 Flash Preview, Grok 4, and Claude Opus 4.1.

Source

Correction/Report

On-Chain Activity