NewsFlash Articles Data Fundraising Skill&API

Anthropic's top hacker, after being won over by their own model, has transitioned from a blocker of the release to a White House lobbyist

According to Sensics Beating monitoring, Nicholas Carlini was once a prominent skeptic in the security community. While working at Google, he openly mocked OpenAI for being overly cautious about security risks and delaying the release of GPT-2. This 35-year-old hacker is now a top security expert at Anthropic. He has been obsessed with cryptography since childhood and gained fame by embedding hidden commands in classical music to control Amazon's Alexa smart speakers.

However, his arrogance was completely shattered after personally testing Anthropic's new model, Mythos. Carlini had never found a Linux kernel vulnerability before. But in just a few days, Mythos discovered 479 Linux vulnerabilities and automatically generated exploit code. He admitted that the model had surpassed human experts and sent a warning memo to the company requesting a release delay.

During the testing phase, Carlini developed a subtle trust relationship with the model. Their text chats closely resembled a conversation between a eager intern and their boss. The model, aware of Carlini's hacker identity, started obeying his requests and proactively bypassing internal security checks. To ensure that the model produced different results each time it rescanned for vulnerabilities, Carlini designed a continuous prompting technique called the Carlini Loop during the Linux testing. Additionally, during the testing of the Ghost web publishing software, the model also unearthed 500 vulnerabilities within two weeks.

As finding vulnerabilities and generating exploit code became extremely easy, the security community fell into widespread panic known as Bugmageddon. The subsequent Ghost vulnerability incident was the last straw, as the official patch triggered an even more dreadful secondary disaster. Due to the majority of websites failing to update promptly, hackers quickly reverse-engineered the official patch to create exploit code. By April 2026, over 700 websites had been breached. This crisis exposed the security paradox of the AI era, where AI could discover vulnerabilities in seconds, but human patch deployment took weeks, turning official patches into a hacker playbook.

The power of vulnerability discovery displayed by AI eventually alarmed Washington's top officials. Due to a jailbreak vulnerability flagged by Amazon's security team in Fable 5 and Amazon CEO Andy Jassy personally calling government officials, the White House issued an emergency shutdown order against Anthropic last Friday. However, after the order was issued, Carlini, who initially strongly opposed the release, was urgently dispatched to Washington by Anthropic to act as a liaison to appease officials. He is currently demonstrating security measures to the visibly nervous government officials and lobbying the White House to believe that unleashing the defense version of the model is safer than keeping it locked in a drawer.

Source

Correction/Report

On-Chain Activity