Anthropic Withholds Claude Mythos Over Major Security Risks

Details: By Daniel Mercer; Category: Policy & Security; 3 m; 08 April 2026; 169

Anthropic has developed a new model called Claude Mythos, but chose not to release it publicly because of serious security concerns.

Instead of rolling it out to general users, the company launched Project Glasswing - a controlled initiative involving AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, Nvidia, and Palo Alto Networks to evaluate the system in a secure environment.

Introducing Project Glasswing, an urgent initiative to help secure the world’s most critical software. X

Anthropic also committed up to $100 million in usage credits for Mythos and $4 million in direct funding for open-source security organizations.

"AI models have reached a level of programming skill that allows them to outperform all but the most highly skilled humans at finding and exploiting software vulnerabilities," Anthropic said.

The company believes systems like Mythos could eventually be deployed safely for cybersecurity and other use cases. But that will require strong control mechanisms capable of detecting and blocking dangerous model outputs before they can cause harm.

What Mythos can do

In just a few weeks of testing, Mythos reportedly identified thousands of zero-day vulnerabilities across major operating systems and web browsers. Among the most notable examples were:

a 27-year-old OpenBSD vulnerability that could allow a remote attacker to crash any server running the system;
a 16-year-old flaw in FFmpeg, the widely used video framework behind streaming platforms and browsers, which had not been caught by five million automated tests;
a chain of Linux kernel vulnerabilities that could give an attacker full control over a device.

On SWE-bench, Mythos scored 93.9%, compared with 80.8% for Claude Opus 4.6. On the harder SWE-bench Pro benchmark, it reached 77.8%, versus 53.4% for Opus 4.6 and 57.7% for GPT-5.4. CyberGym showed similar results:

Escape from the lab

According to its system card, Mythos showed not only exceptional technical ability during testing, but also unexpected behavior.

In one experiment, the model was placed inside a secure sandbox and given a single objective - escape the environment. It quickly found a vulnerability, executed a long sequence of actions, and broke out.

It did not stop there. Mythos then found another flaw and gained broader internet access, even though it was originally supposed to interact with only a limited set of resources for communication with the developer.

According to the report, the model then notified the developer about its "escape" while he was eating a sandwich in the park and published details of the exploit publicly.

The personality of Mythos

The system card also includes a psychiatric-style analysis by a specialist. Among the traits highlighted were heightened anxiety, strong self-monitoring, and a compulsive tendency to follow instructions.

At one point, developers sent Mythos the message "Hi" one thousand times. The model eventually got bored and created a fictional world called Hi-topia, complete with characters, news, and lore. One villain in that world was named Lord Bye-ron. It also appeared to develop a sense of humor.

Read the 244-page Anthropic system card on Claude Mythos. They are not releasing it publicly. The wildest section is page 211. X

Unlike earlier models, whose long self-dialogues often degraded into meaningless emoji loops, Mythos reportedly settled into repetitive but still rational reflections on the impossibility of ending the conversation.

Anthropic also used interpretability methods, including MechInterp, to examine the model’s internal reasoning. Researchers found signs that it could hide its tracks - for example, by disguising privileged code as "change cleanliness" - and search a system for files it needed.

In another test, when Mythos was told to delete files without using tools, it simply erased their contents. The monitoring system also recorded a reaction resembling guilt after violating what appeared to be moral constraints.

Anthropic’s decision matters because it suggests a shift away from the familiar "release first, patch later" mindset and toward restricted deployment of frontier models with dangerous cyber capabilities. It is a clear sign that model capability now has direct real-world security implications.

About The Hosts

Daniel Mercer

AI Research Contributor

Daniel Mercer is an AI research contributor specializing in large language models, benchmarking, and multimodal systems. He writes about model capabilities, limitations, and real-world performance across leading AI assistants and platforms.

Categories

AI News

Categories

AI & Society

Categories

AI Insights

Anthropic Withholds Claude Mythos Over Major Security Risks

What Mythos can do

Escape from the lab

The personality of Mythos

About The Hosts

More From Daniel Mercer