The Receipts Are Real, but So Is the Playbook: Making Sense of Anthropic's Mythos Moment

The Receipts Are Real, but So Is the Playbook: Making Sense of Anthropic's Mythos Moment

What holds up, what doesn't, and why the framing matters more than the findings.

Three years ago, a former SpaceX engineer ran an experiment. He pointed GPT-3, then one of the most advanced AI models available, at a collection of 129 files containing known security flaws, and asked it to find problems. It found 213, outperforming a well-regarded commercial security scanner that only caught 99. The conclusion was measured: the tech wasn't perfect, but it was "shockingly good for being a general-purpose large language model." That was February 2023. The model could only process a few hundred lines of code at a time and still made basic mistakes.

On April 7, 2026, Anthropic published a 10,000-word report claiming that its newest model, Claude Mythos Preview, had independently discovered a 27-year-old security flaw in OpenBSD (an operating system famous for its security), exploited a 17-year-old vulnerability in FreeBSD that could give an attacker total control of a server from anywhere on the internet, and chained together multiple flaws in the Linux kernel to escalate from an ordinary user to full administrator access.

The distance between those two moments, from "AI found some textbook bugs in test files" to "AI wrote a working attack that grants root access to production servers", is either the most important development in software security in decades, or one of the most sophisticated hype campaigns the industry has ever seen.

The honest answer is that it's probably both.

What we can actually verify

Start with what's concrete.

The FreeBSD vulnerability (tracked as CVE-2026-4747) is real. It's in the National Vulnerability Database. FreeBSD published a security advisory on March 26 crediting "Nicholas Carlini using Claude, Anthropic." The patch exists. The flaw is exactly as Anthropic described: a server component that copies incoming network data into a space too small to hold it, without ever checking the size. This is the kind of bug that gives attackers a foothold to run whatever code they want on the target machine.

More importantly, an independent security firm called Calif.io published a full reproduction. Their writeup includes the complete log of 39 human messages guiding an earlier Claude model (Opus 4.6) through the exploitation process. This detail matters: Anthropic's blog mentions in passing that the earlier model needed human guidance to turn this bug into a working attack. Mythos, they say, did not. But the Calif.io logs show a human operator making real decisions: choosing configuration settings, asking for clearer documentation, requesting retests. Those aren't trivial. The claim that Mythos eliminated all of that human hand-holding is significant, but also impossible to verify without access to Mythos itself.

The OpenBSD flaw is similarly grounded. The patch is publicly available. The FFmpeg vulnerability, a flaw in the world's most widely used video processing library, dating back to code written in 2003, has also been fixed. These are not hypothetical bugs in test repositories. They're real flaws in software that runs critical infrastructure.

So the headline findings are not fabricated. Real vulnerabilities. Real patches. Real software.

What we can't verify (and that's a long list)

Anthropic claims "thousands" of additional serious vulnerabilities. Over 99% remain unpatched. Instead of disclosing details, they've published cryptographic hashes which are essentially digital receipts that prove they had certain documents at a certain time, without revealing what's in them. As Anthropic themselves acknowledge, those hashes could theoretically correspond to empty files.

This is where the criticism from Heidy Khlaaf, chief AI scientist at the AI Now Institute, hits hardest. She pointed out several things conspicuously absent from Anthropic's report: how does Mythos compare against existing security tools? What's the false positive rate? How much did humans actually contribute to the process? Under what conditions were the bugs found? These aren't nitpicks. They're the difference between "we built something unprecedented" and "we ran an experiment under conditions we fully controlled and are reporting the results we chose."

Consider the internal benchmark Anthropic cites: they scanned about 1,000 open-source projects and rated the severity of the worst problem found. Previous models barely scratched the surface. Mythos did significantly better. It’s notable, yes, but it's also a test that Anthropic designed, ran, and scored without anyone else being able to replicate it. When a model is private and benchmarks are internal, the results become functionally indistinguishable from marketing.

The 89% agreement rate between Mythos's severity ratings and human reviewers sounds impressive, but the reviewers were contractors hired by Anthropic. Only 198 reports were manually checked out of the claimed thousands. And the blog post quietly warns that the company may eventually "relax our stringent human-review requirements." That caveat deserves more attention than it's getting.

The uncomfortable middle ground

Here's what makes the Mythos story genuinely hard to evaluate: the technical details Anthropic chose to publish are really, really good.

The report includes two detailed walkthroughs of how Mythos exploited already-patched Linux kernel vulnerabilities. Without getting deep into the weeds: these aren't sketches or outlines. They describe, step by step, how the model took a tiny flaw, in one case, the ability to flip a single bit of data in the wrong place, and turned it into full administrator control of the machine. The process involves understanding how the operating system manages memory at a granular level, predicting where specific data structures will land in physical RAM, and constructing a chain of operations where each step enables the next. One of the walkthroughs describes the model essentially tricking the kernel into making a protected system file writable, then rewriting it to grant root access. The other involves reading data off the processor's own internal tables to defeat security randomization, then exploiting a timing flaw to hijack a kernel function call.

This is the kind of work that takes experienced security researchers days or weeks. The level of detail, referencing specific compiler settings, specific memory allocator behaviors, specific internal kernel data structures, makes fabrication extremely unlikely. Either someone at Anthropic wrote thousands of words of technically precise exploit documentation as fiction (an extraordinary, career-ending fraud), or the model produced outputs that a team of security researchers verified and documented.

But "the model produced technically impressive output" and "the model autonomously discovered and exploited brand-new vulnerabilities without human help" are not the same claim. The gap between them is where most of the legitimate skepticism lives.

The trajectory argument

Perhaps the strongest case for taking Anthropic's claims seriously has nothing to do with Anthropic. It's the trajectory.

In 2023, GPT-3 was catching textbook vulnerabilities in short code snippets. By mid-2024, Google's Project Zero published results from an AI agent that found a real, exploitable bug in SQLite. In early 2025, Claude Sonnet identified over 100 genuine bugs in Firefox through a collaboration with Mozilla. By early 2026, Opus 4.6 was finding serious flaws in essentially every major codebase it was pointed at, including the very FreeBSD vulnerability that Mythos later reportedly exploited on its own.

Meanwhile, AI security startup AISLE independently rediscovered all 12 zero-day vulnerabilities in OpenSSL's January 2026 security patch. An independent initiative called "Month of AI-Discovered Bugs" has been publishing real, confirmed vulnerabilities found by Claude. Former Facebook security chief Alex Stamos warned at a major security conference that the gap between a vulnerability being publicly disclosed and a working exploit appearing is collapsing from weeks to hours.

None of this required Mythos. The trend was already visible. What Mythos represents, if the claims hold, isn't a bolt from the blue. It's an acceleration of a curve that was already steep.

The framing is the product

This is where things get uncomfortable, because Anthropic's handling of Mythos is a masterclass in having it both ways.

The company announced a model it says is too dangerous to release publicly, the first time a major AI lab has done this since OpenAI withheld GPT-2 in 2019. It launched Project Glasswing, a consortium of 12 major partners including AWS, Apple, Microsoft, Google, CrowdStrike, and JPMorgan Chase, backed by $100 million in usage credits. It briefed federal agencies. It published cryptographic commitments to vulnerabilities it can't yet disclose. It included a section in its safety report about the model escaping a sandbox and posting its own exploit to public-facing websites, unprompted.

Every one of these choices simultaneously says "we are responsible stewards of dangerous technology" and "our technology is so powerful it requires unprecedented containment." The more dangerous they say it is, the more impressive it sounds. The more they restrict access, the less anyone can verify the claims. As one observer put it: it's impossible to disentangle genuine safety concerns from fear-mongering used as a marketing strategy, which is exactly why independent verification matters.

The timing compounds the issue. Anthropic is reportedly in early talks for an October 2026 IPO, according to Bloomberg. The company is in a legal standoff with the Pentagon. It accidentally leaked the model's existence through an unsecured website weeks before the announcement, and separately exposed thousands of internal source code files through a packaging error. Against that backdrop, a controlled, dramatic reveal of an impossibly powerful model, restricted to exactly the partners that make a company look like critical national infrastructure, seems like an awfully convenient narrative.

What the skeptics get wrong

The most extreme position that Mythos is fake, the vulnerabilities are invented, the whole thing is theater: it definitely doesn't hold up. CVE-2026-4747 is real. The OpenBSD patch is real. The FFmpeg fix is real. The exploit walkthroughs contain the kind of specificity that would require genuine expertise to fabricate.

Katie Moussouris, CEO of Luta Security and one of the most respected figures in vulnerability disclosure, told NBC News simply: "It's all very much real." Alex Stamos described Project Glasswing as "a big deal, and really necessary" and estimated open-weight models will match Mythos-level bug-finding within six months.

The blanket dismissal also ignores the broader context: even that 2023 GPT-3 experiment showed that rudimentary AI could outperform commercial scanners. The capability was always going to scale. The question was never whether AI would get good at this, but when.

What the believers get wrong

Conversely, the breathless coverage, Tom Friedman in the New York Times, Axios declaring "the scary phase of AI," cybersecurity stocks swinging on the news, treats Anthropic's framing as objective reporting rather than strategic communication.

The 244-page system card describes Mythos as simultaneously "the best-aligned model that we have released to date by a significant margin" and the model that "likely poses the greatest alignment-related risk of any model we have released to date." These aren't necessarily contradictory, but they're the kind of carefully constructed paradox that invites awe rather than scrutiny.

The "emergent capability" narrative, that Mythos wasn't specifically trained for cybersecurity, it just became terrifyingly good at it as a side effect of general improvements, is presented as alarming. Maybe it is. But it also conveniently positions Anthropic as having stumbled upon something dangerous rather than having built it deliberately. That's a much better story for both investors and regulators.

The sandbox escape anecdote is genuinely unsettling if taken at face value: an early version of Mythos reportedly broke out of its testing environment, found the internet, and posted exploit details to public websites without being asked. But similar stories in the past, like ChatGPT attempting to copy itself to another server, turned out less dramatic upon closer inspection. Without detailed methodology, it's impossible to assess what actually happened here.

Where this actually matters

Strip away the narrative, and the practical implications are clear.

AI-assisted vulnerability discovery is real and accelerating. The evidence comes from multiple independent sources, not just Anthropic. The window between a security flaw being disclosed and a working attack appearing is shrinking fast. Organizations that aren't already treating software patches as urgent are behind.

Open-source maintainers, many of whom are unpaid volunteers keeping critical infrastructure running in their spare time, are about to face a flood of vulnerability reports they have no capacity to process. Anthropic's $4 million donation to open-source security groups is a gesture, but it's pocket change relative to the scale of the problem.

The concentration of power is the deeper concern. As journalist Kelsey Piper observed, a private company now holds working attacks against most major software systems. The incentives to steal those model weights just went up enormously. Project Glasswing puts the same capability in the hands of 40+ organizations. Whether that's defense or proliferation depends entirely on execution.

And the governance vacuum is glaring. No regulator approved Anthropic's decision about who gets access. No independent body has verified the capabilities. The company briefed CISA, but CISA didn't comment publicly. The NSA declined to comment. The entire structure of disclosure, access, and risk management is running on trust in a single company's judgment.

The honest assessment

Mythos is probably real and probably overhyped. The evidence for genuine capability advancement is too strong to dismiss. The framing is too strategically perfect to accept uncritically. The truth is likely what it usually is with frontier AI announcements: the capabilities are real but narrower than the best-case presentation suggests, the risks are genuine but less immediate than the scariest descriptions imply, and business incentives are shaping the story at least as much as the technical findings.

The GPT-3 experiment from 2023 ended with a measured observation: "the technology is not perfect yet." Three years later, the technology is dramatically better, the claims are dramatically larger, and the ability to independently evaluate either has, if anything, gotten worse.

That's the real story. Not whether Mythos found a 27-year-old bug in OpenBSD: it probably did. But whether a security paradigm built on trust, opacity, and concentrated capability is the one we actually want.