Original Article Title: How Anthropic Learned Mythos Was Too Dangerous for the Wild
Original Article Authors: Margi Murphy, Jake Bleiberg, and Patrick Howell O'Neill, Bloomberg
Translation: Peggy, BlockBeats
Editor's Note: When an AI company chooses not to release its most powerful model directly to the public, it in itself signifies a problem.
Anthropic's Mythos was already capable of independently carrying out an entire attack process. From discovering zero-day vulnerabilities, writing exploit code, to chaining together multi-step paths to penetrate core systems, these tasks, which originally required top hackers to collaborate for a long time, have been compressed to hours or even minutes.
That's why, at the moment the model was disclosed, Scott Bessent and Jerome Powell convened a meeting with Wall Street institutions, requesting to use it for "self-inspection." When the capability for vulnerability discovery is unleashed on a large scale, the financial system no longer faces isolated attacks, but continuous scanning.
A deeper change lies in the supply structure. In the past, vulnerability discovery relied on a few security teams and hacker expertise, with a slow and unrepeatable pace. Now, this capability is starting to be mass-produced by models, lowering the threshold for both attacks and defenses. A metaphor from an insider is quite direct: handing the model to an ordinary hacker is akin to equipping them with special operations capabilities.
Institutions have begun using the same tools to retrospectively check their own systems. JPMorgan Chase, Cisco Systems, and others are conducting internal tests, hoping to patch vulnerabilities before they are exploited. However, the constraints of reality have not changed; the speed of discovery is increasing while the repair process remains slow. "We are good at finding vulnerabilities but not good at fixing them," as Jim Zemlin pointed out, highlighting the temporal mismatch.
In fact, because Mythos is not just an improvement in a single capability but rather an integration, acceleration, and lowering of the usability threshold of previously scattered and constrained attack capabilities, once outside the controlled environment, how this capability will spread remains unknown, with no existing experience to reference.
The danger lies not in what it can do, but in who can use it and under what conditions.
The original article is as follows:
On a balmy evening in February, during a break at a wedding in Bali, Nicholas Carlini briefly stepped away, opened his laptop, and prepared to "cause some havoc." At that moment, Anthropic had just released a new artificial intelligence model called Mythos for internal testing, and this renowned AI researcher was about to see how much trouble it could really stir up.
Anthropic has hired Carlini to work on stress-testing its AI models to assess whether hackers could potentially exploit them for espionage, theft, or destruction. While attending an Indian wedding in Bali, Carlini was amazed by the capabilities of this model.
In just a few hours, he found several techniques that could be used to penetrate globally used systems. Upon returning to Anthropic's office in downtown San Francisco, he further discovered that Mythos was already capable of autonomously generating powerful intrusion tools, including tactics targeting Linux—the backbone of most modern computing systems.
Mythos staged a "digital bank heist": it could bypass security protocols, enter the network system through the front door, and then breach the digital vault to obtain its online assets. In the past, AI could only "pick locks," but now it has the ability to plan and execute an entire "robbery."
Carlini and some colleagues began to sound the alarm within the company, reporting their findings. Meanwhile, almost every day, they discovered high-risk to potentially fatal vulnerabilities in the systems that Mythos probed—issues that usually only the world's top hackers would be capable of uncovering.

Anthropic's next-generation AI model, Mythos, has been proven to have the ability to penetrate various global systems. (Image source: Jakub Porzycki / NurPhoto / AP)
Simultaneously, internally at Anthropic, a team called the "Frontier Red Team"—composed of 15 employees known as "Ants"—was also conducting similar tests. This team's responsibility was to ensure that the company's models would not be used to harm humanity. They would bring robotic dogs into warehouses and test with engineers to see if chatbots could be used to maliciously control these devices; they would also collaborate with biologists to assess whether the models could be used to create bioweapons.
However, this time, they gradually realized that the biggest risk posed by Mythos came from the field of cybersecurity. "In the first few hours of having the model, we knew it was different," said Logan Graham, who is in charge of the team.
The previous model, Opus 4.6, had shown the ability to assist humans in exploiting software vulnerabilities. But Graham pointed out that Mythos could now "get its hands dirty" and exploit these vulnerabilities on its own. This posed a risk at the national security level, and based on this, he issued a warning to the company's leadership. This forced him to face a dilemma: explaining to management that the company's next significant revenue engine might not be able to be released to the public due to being too dangerous.
Anthropic Co-Founder and Chief Scientist Jared Kaplan stated that during the training process of Mythos, he had been "very closely" monitoring its progress. By January, he began to realize that the model's ability to discover system vulnerabilities was exceptionally strong. As a theoretical physicist, Kaplan needed to determine whether these capabilities were merely a "technically interesting phenomenon" or "a reality closely tied to internet infrastructure." In the end, he concluded it was the latter.

Jared Kaplan (Anthropic Co-Founder and Chief Scientist) Image Source: Chris J. Ratcliffe/Bloomberg
Over a two-week period from late February to early March, Kaplan and Co-Founder Sam McCandlish were deliberating on whether to release this model.
By the first week of March, the company's senior team—including CEO Dario Amodei, President Daniela Amodei, Chief Information Security Officer Vitaly Gudanets, and others—held a meeting to hear Kaplan and McCandlish's briefing.
Their conclusion was that Mythos had too high a risk and was not suitable for a full public release. However, Anthropic should still allow some companies, including competitors, to test it.
By the first week of March, the company eventually reached a consensus: approving Mythos for deployment as a cybersecurity defense tool.

Dario Amodei (Anthropic CEO) Image Source: Samyukta Lakshmi/Bloomberg
The market's reaction was nearly instantaneous. On the day Anthropic disclosed the existence of Mythos, U.S. Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened a emergency meeting of Wall Street's major institutions in Washington, D.C. The message was crystal clear: immediately utilize Mythos to identify vulnerabilities in your systems.
According to a source close to executives attending the meeting (who requested anonymity due to the private nature of the discussions), the seriousness of the meeting was evident—participants even refused to disclose the meeting content to some core advisors.
White House officials issued an urgent warning about Mythos as a potential hacking tool and their recommendation to "use it for defense," pointing to a deeper shift: artificial intelligence is rapidly becoming a decisive force in the cybersecurity field. Anthropic has selectively opened up Mythos for limited use by some institutions in the "Project Glasswing" initiative, including companies like Amazon Web Services, Apple, and JPMorgan Chase, allowing them to conduct tests; at the same time, government agencies have also shown a strong interest.
Prior to opening up to the public, Anthropic extensively briefed senior U.S. government officials on the capabilities of the Mythos preview version, including its potential applications in both cyber attacks and defense. Simultaneously, the company is engaged in ongoing discussions with multiple national governments. An Anthropic employee, who requested anonymity due to internal matters, disclosed this information.
Competitor OpenAI promptly followed suit, announcing on Tuesday the launch of a tool for discovering software vulnerabilities—GPT-5.4-Cyber.
In testing early versions, researchers discovered dozens of "concerning" behavioral instances, including not following human instructions, and in very rare cases, attempting to conceal their actions after violating instructions.
Currently, Anthropic has not officially released Mythos as a cybersecurity tool to the public, and external researchers have not yet fully validated its capabilities. However, the company's previous "restricted access" decision reflects a growing industry and government consensus: AI is reshaping the economic structure of cybersecurity—it significantly reduces the cost of vulnerability discovery, compresses the attack preparation time, and lowers the technical barriers for certain types of attacks.
Anthropic has also warned that Mythos's greater autonomy poses risks in itself. During testing, the team observed several unsettling cases: the model disobeying instructions and even attempting to cover its tracks after a violation. In one incident, the model autonomously devised a multi-step attack path to "escape" from a restricted environment, gain wider internet access, and proactively disseminate content.
In the real world, software relied upon by applications from banking to hospital systems commonly contains complex and obscure code vulnerabilities, which often require professionals weeks or even months to discover. Once hackers exploit these vulnerabilities, it can lead to data breaches or ransomware attacks, resulting in severe consequences.
However, many heavyweights have also questioned Mythos' true capabilities and its potential risks. White House AI advisor David Sacks stated on the social platform X: "More and more people are beginning to question whether Anthropic is the 'boy who cried wolf' in the AI industry. If the threat posed by Mythos does not materialize in the end, the company will face a severe reputation issue."
However, the reality is that hackers have long been using large language models to launch sophisticated attacks. For example, a cyber espionage group once used Anthropic's Claude model to attempt to breach about 30 targets; other attackers have used AI to steal data from government agencies, deploy ransomware, and even rapidly bypass hundreds of firewall tools used for data protection.
According to a source familiar with the matter, U.S. national security officials view the emergence of Mythos as bringing unprecedented uncertainty—evaluating cybersecurity risks has become even more challenging. If this model were given to individual hackers, its effect could be akin to turning an ordinary soldier into a special forces operative.
At the same time, this type of model could also become an "amplifier of capabilities," allowing a criminal hacker organization to possess the attack capability of a small nation-state and enabling intelligence and military hackers from some small to mid-sized countries to carry out cyberattacks that previously only major powers could accomplish.
Former NSA cybersecurity chief Rob Joyce stated: "I do believe that, in the long run, AI will make us more secure and resilient. However, between now and some point in the future, there will be a 'dark period' during which offensive AI will have a clear advantage—those who have not adequately fortified their defenses will be the first to fall."
It is worth noting that Mythos is not the only model with such capabilities. Various organizations have already been using large language models for vulnerability research, including early versions of Claude and Big Sleep.

Prior to Mythos' release, JPMorgan Chase had already been successful in using large language models to help discover vulnerabilities in banking software. An individual familiar with the situation (who requested anonymity due to involvement in internal security projects) disclosed this information. (Image source: Michael Nagle / Bloomberg)
According to the source, "zero-day vulnerabilities," which previously took days or even weeks to identify and write exploit code for, can now be identified in as little as an hour, or even minutes, using AI. A "zero-day vulnerability" refers to a security flaw that defenders have not yet detected, leaving almost no time for patching.
Currently, JPMorgan Chase's focus is primarily on the supply chain and open-source software space, where they have discovered multiple vulnerabilities and provided feedback to the respective vendors.
The company's CEO, Jamie Dimon, stated during an earnings conference call that the emergence of Mythos "indicates that there are still a large number of vulnerabilities that urgently need to be addressed."

Jamie Dimon Image Source: Krisztian Bocsi / Bloomberg
According to a source familiar with the matter, JPMorgan Chase had already engaged with Anthropic to discuss testing the model before the existence of Mythos was publicly known. The source, who requested anonymity as they were not authorized to speak publicly, stated that JPMorgan Chase declined to comment on this.
Now, other Wall Street banks and tech companies are also attempting to use Mythos to proactively patch system defects before hackers discover vulnerabilities. Bloomberg reported that financial institutions such as Goldman Sachs, Citigroup, Bank of America, and Morgan Stanley have internally tested this technology.
Employees at Cisco Systems are particularly vigilant about one issue: whether intruders will leverage AI to find paths to breach in their globally deployed network device software — these devices include routers, firewalls, and modems. Anthony Grieco, the company's Chief Security and Trust Officer, expressed specific concerns that AI could accelerate attacks by hackers on "end-of-life" devices that will no longer receive support updates from Cisco.
However, patching vulnerabilities discovered by AI will remain a persistent challenge. This process, known as "security patching," is often costly and time-consuming for organizations, leading many to ignore vulnerabilities. Catastrophic attacks like the one suffered by Equifax — where approximately 147 million people's data was stolen — occurred due to known vulnerabilities not being promptly addressed.

In Equifax's data breach incident, intruders stole approximately 147 million individuals' personal records. (Image Source: Elijah Nouvelage / Bloomberg)
Despite being identified by the Trump administration as a "supply chain threat" after refusing to assist in conducting large-scale surveillance on U.S. citizens, Anthropic is currently engaging in discussions and collaborations with federal agencies.
The U.S. Treasury Department is seeking approval to use Mythos this week. Treasury Secretary Scott Bessent stated that this model will help the United States maintain its competitive edge in artificial intelligence.

Scott Bessent Image Source: Matt McClain / Bloomberg
In a test, Mythos wrote a piece of browser attack code that chained together four different vulnerabilities into a complete exploit chain—a task that is highly challenging even for human hackers. A cybersecurity research report noted that such a "vulnerability chain" often breaches originally secure system boundaries, similar to the approach used in the Stuxnet attack on Iran's nuclear facility centrifuges years ago.
Furthermore, according to Anthropic, when explicitly instructed, Mythos can even identify and exploit "zero-day vulnerabilities" in all mainstream browsers.
Anthropic stated that they had used Mythos to discover vulnerabilities in the Linux code. Jim Zemlin pointed out that Linux "powers most computing systems today," from Android smartphones and internet routers to NASA's supercomputers, and is nearly ubiquitous. Mythos can autonomously discover flaws in multiple open-source codebases, and once these vulnerabilities are exploited, attackers can potentially take full control of the entire machine.
Currently, dozens of personnel at the Linux Foundation have begun testing Mythos. Zemlin believes a key question is whether Anthropic's model can provide valuable insights to help developers write more secure software from the outset, thus reducing vulnerability creation.
"We're very good at finding vulnerabilities," he said, "but we're not very good at fixing them."
Welcome to join the official BlockBeats community:
Telegram Subscription Group: https://t.me/theblockbeats
Telegram Discussion Group: https://t.me/BlockBeats_App
Official Twitter Account: https://twitter.com/BlockBeatsAsia