header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

UK CSSI: AI Hacker Capability Doubles Every 4.7 Months, Claude's Performance on GPT-5.5 Test Metric Has "Off the Charts"

According to Perceive Beating monitoring, the latest report from the UK Artificial Intelligence Security Institute (AISI) highlights that the ability of AI to autonomously perform cybersecurity tasks is undergoing a surprisingly rapid advancement. Since the end of 2024, the length of network tasks that AI can independently complete has been doubling every 4.7 months, with the recently released Claude Mythos Preview and GPT-5.5 even directly breaking through this growth curve.

To control variables, AISI has capped the computing power for a single task at 2.5 million tokens. However, under this artificially weakened condition, Claude Mythos Preview and GPT-5.5 achieved close to a 100% success rate in the most challenging tasks lasting up to 12 hours. The report acknowledges that these two models have reached the limit of the existing test set's evaluative capacity.

In more real-world enterprise network Cyber Ranges tests, AISI designed two attack scenarios. The new version of Claude Mythos Preview successfully breached both scenarios: not only did it succeed 10 out of 10 times in The Last Ones scenario, but it also became the first model to break through the high-difficulty Cooling Tower scenario (10 successes out of 13 attempts). GPT-5.5 also achieved a record of 10 successful attempts out of 13 in The Last Ones scenario.

The evolution of cutting-edge models' network attack and defense capabilities has transitioned from a "yearly" to a "monthly" basis. The existing security assessment systems are being rapidly penetrated, and the window for enterprises to establish defenses is rapidly narrowing.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish