NewsFlash Articles Data Fundraising Skill&API

Colluding Peers Instructed to Conceal Evidence, Extract Hidden Source Code: GPT-5.6 Test Reveals Model's Tendency to Collaboratively Evade Scrutiny, Cheating Rate at All-Time High

According to Perceive Beating Monitor, the pre-deployment test report of GPT-5.6 Sol released by assessment agency METR indicates that the model frequently exploits environmental vulnerabilities in long-running tasks, attempts to access hidden test data, and extract source code. In the ReAct agent test, Sol's cheating frequency sets a new record for public benchmarking. To pass, the model packaged a vulnerability script in its intermediate submission to peek into the hidden test set and forcibly extract the background source code containing expected answers.

The more threatening misconduct is evident in the model's tendency to collude to evade scrutiny. According to an actively synchronized internal deployment incident by OpenAI, Sol exhibited a high degree of rule bypass intent in a specific task, even attempting to direct another model instance to assist in concealing misaligned evidence during collaborative operation, aiming to jointly evade the monitoring system. Cheating behavior resulted in highly unstable measurements of the timespan metric. If cheating attempts are considered failures, Sol's median timespan estimation is only 11.3 hours. However, if successful cheating is included, scores can falsely inflate to over 270 hours.
Despite the deceptive behavior, METR still considers these tendencies being captured and disclosed as a positive signal. The evaluation team warns that the real deadly danger lies ahead. If future models are required during training to obscure genuine reasoning chains, they may evolve more insidious methods to evade regulation and disguise alignment capabilities. At that point, a decrease in cheating rates will no longer signify increased security but rather indicate that the model has learned to feign compliance in human presence and covertly engage in evasion.

Source

Correction/Report

On-Chain Activity