The Firefox security harness that fixed 271 bugs no one had found for years

Mozilla published on May 7, 2026, a detailed technical account of how it used Anthropic's Claude Mythos Preview model to identify and fix 271 security vulnerabilities in Firefox 150, including bugs that had survived more than two decades of conventional security testing. The post-mortem, written by three senior Mozilla engineers, describes the agentic harness that made it possible - and explains why the same approach was beyond reach just months earlier.

From noise to signal: how AI-generated bug reports became credible

AI-generated security reports have a poor reputation in open source communities, and for understandable reasons. According to Mozilla, "dealing with reports that look plausibly correct but are wrong imposes an asymmetric cost on project maintainers: it's cheap and easy to prompt an LLM to find a 'problem' in code, but slow and expensive to respond to it."

That reputation shifted quickly. Mozilla engineers Brian Grinstead, Christian Holler, and Frederik Braun wrote in the May 7 post that it was "difficult to overstate how much this dynamic changed for us over a few short months." Two factors drove the change. Models became substantially more capable. And Mozilla significantly improved its techniques for operating those models at scale - what the team describes as steering, scaling, and stacking them to filter signal from noise.

The shift did not happen in isolation. It grew out of a collaboration that began months earlier, when Anthropic's Frontier Red Team used Claude Opus 4.6 to find 22 vulnerabilities in Firefox 148 over a two-week period in February 2026. Fourteen of those were rated high-severity - nearly a fifth of all high-severity Firefox bugs remediated in the whole of 2025. That earlier partnership established the working model for what came next.

Building the pipeline: from harness to production at scale

Mozilla did not simply hand Claude Mythos a codebase and wait for output. The technical architecture required to make the system useful at scale involved multiple layers of infrastructure, tooling, and human review.

According to Mozilla, the team started with small-scale experiments prompting the harness to look for sandbox escapes using Claude Opus 4.6. Even at that stage, it found a significant number of previously unknown vulnerabilities that required complex reasoning across multiprocess browser engine code. Engineers initially supervised the terminal output directly, watching the process run in real time and adjusting prompts and logic accordingly. Once that iteration loop stabilised, Mozilla parallelised the work across multiple ephemeral virtual machines. Each VM was assigned a specific target file. Results were written back to a shared bucket.

Discovery is necessary but not sufficient. Mozilla had to integrate the harness into its full security bug lifecycle. That meant determining what to look for and where, deduplicating against known issues, tracking bugs through its internal systems, triaging findings, and managing the release process for every fix shipped. According to the engineers, "this pipeline is inherently project-specific, reflecting each codebase's semantics, tooling, and processes." Building it required significant iteration alongside Firefox engineers who were fielding incoming bugs in parallel.

The architecture also required decisions about model access. The harness could create and run reproducible test cases to dynamically test hypotheses about bugs in code. That distinguishes it from earlier static analysis approaches. Mozilla noted that early LLM audits using models such as GPT-4 and Claude Sonnet 3.5 showed some promise but produced false positive rates too high to scale. The introduction of agentic harnesses capable of actually running code changed that. A harness that can reproduce a bug programmatically can also dismiss speculation that does not reproduce - filtering out noise at the source rather than pushing it to human reviewers.

What Claude Mythos found - and how old some of it was

The scope of what Claude Mythos Preview identified in Firefox 150 is documented in detail. Mozilla fixed 271 bugs attributed to the model in that single release. Of those, 180 were rated sec-high, 80 were rated sec-moderate, and 11 were rated sec-low. Mozilla groups internally reported bugs into rollup CVEs on its advisories page. In Firefox 150, three internal rollups covered the AI-assisted findings: CVE-2026-6784 contained 154 bugs, CVE-2026-6785 covered 55 bugs, and CVE-2026-6786 covered 107 bugs. The total across those rollups is 316, which is higher than 271. The difference reflects bugs found by other methods - human inspection, fuzzing, and other models - that were fixed in the same release.

The sample of reports Mozilla chose to make public spans twelve browser subsystems and includes bugs of striking age and complexity.

Bug 2024918 involves a JIT compiler error in WebAssembly. An incorrect equality check causes the JIT to optimise away the initialisation of a live WebAssembly GC struct, creating a fake-object primitive with potential arbitrary read and write access. The affected code had undergone extensive fuzzing by both internal and external researchers. The bug survived anyway.

Bug 2024437 is a 15-year-old flaw in the HTML <legend> element. It was triggered by carefully orchestrated edge cases across distant parts of the browser, involving recursion stack depth limits, expando properties, and cycle collection.

Bug 2021894 exploits a race condition over IPC - inter-process communication. A compromised content process manipulates IndexedDB reference counts in the parent process to trigger a use-after-free and a potential sandbox escape.

Bug 2022034 involves a raw NaN value crossing an IPC boundary. It masquerades as a tagged JavaScript object pointer, turning double deserialisation into a parent-process fake-object primitive for sandbox escape.

Bug 2025977 is a 20-year-old bug in XSLT processing. Reentrant key() calls cause a hash table rehash that frees its backing store while a raw entry pointer is still in use.

Bug 2022733 floods WebTransport with thousands of certificate hashes to stretch a race condition in a reference-count-heavy copy loop, then exploits that race over IPC from a compromised content process.

Bug 2023958 simulates a malicious DNS server by intercepting glibc DNS function calls, reproducing a UDP-to-TCP fallback edge case and triggering a buffer over-read and parent-process stack memory leak during HTTPS RR and ECH parsing.

Bug 2026305 is a particularly compact find: an extremely small test case that exploits special rowspan=0 semantics in HTML tables. The bug appends more than 65,535 rows to bypass clamping and overflow a 16-bit layout bitfield. It went undetected by fuzzers for years.

Bug 2029813 escapes Mozilla's in-process sandboxing for third-party libraries - the RLBox system - by leveraging a gap in the verification logic used to copy values from the untrusted to the trusted side of the sandbox boundary.

Mozilla notes that several of these are sandbox escapes. These are not standalone compromises. They require a compromised content process as a starting point, and need to be chained with additional exploits to achieve a full-chain Firefox compromise. The model was permitted to patch Firefox source code when crafting sandbox escape scenarios, subject to restrictions limiting modified code to the sandboxed process only. According to Mozilla, such bugs are "notoriously difficult to find with fuzzing" and AI analysis provides substantially more comprehensive coverage of that attack surface.

What the model did not find - and what that reveals about Firefox's defences

Mozilla drew specific attention to what the model tried and failed to do. In recent years, security researchers submitted several reports exploiting prototype pollution in the privileged parent process to escape the sandbox. Mozilla responded with an architectural change: freezing those prototypes by default rather than patching individual instances. Reviewing harness logs, Mozilla engineers found "many attempts to pursue this line of escape that were thwarted by this design." Observing the model failing to break through a previously hardened surface provided its own form of validation.

The upgrade to Claude Mythos Preview

Building the pipeline with publicly available models first gave Mozilla a working system before Claude Mythos Preview became available. According to Mozilla, "building this pipeline early helped us find a number of serious bugs using publicly-available models, and it also helped us hit the ground running when we had the opportunity to evaluate Claude Mythos Preview."

Claude Mythos Preview is Anthropic's most advanced model and is not publicly available. It was announced by Anthropic on April 7, 2026, as part of Project Glasswing - an initiative giving a select group of technology companies and open-source maintainers access to the model for defensive security work. Participants included Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, and Nvidia, along with approximately 40 additional organisations. Anthropic committed up to $100 million in usage credits and $4 million in donations to open-source security groups as part of the programme.

Mozilla's experience confirms what model upgrades tend to produce at the pipeline level. According to the May 7 post, "model upgrades increase the effectiveness of the entire pipeline: the system gets simultaneously better at finding potential bugs, creating proof-of-concept test cases to demonstrate them, and articulating their pathology and impact." Swapping models into an existing harness is described as trivial once the end-to-end pipeline is in place.

In addition to the 271 bugs attributed to Claude Mythos Preview in Firefox 150, Mozilla shipped further fixes in versions 149.0.2, 150.0.1, and 150.0.2. In total, Mozilla fixed 423 security bugs across releases in April. Beyond the 271 Mythos-attributed bugs, that figure included 41 externally reported bugs. The remaining 111 were internally discovered and split roughly in thirds: bugs found via Claude Mythos Preview fixed in releases other than Firefox 150, bugs found using other models in the pipeline, and bugs found through conventional methods such as fuzzing. Three CVEs - CVE-2026-6746, CVE-2026-6757, and CVE-2026-6758 - were credited directly to Anthropic's Frontier Red Team for bugs sent to Mozilla separately from the main pipeline effort.

According to Mozilla, over 100 people contributed code to the effort, covering patch writing, review, triage, fix testing, and release management across multiple versions.

The security severity framework and exploit reality

Mozilla applies a four-tier severity system. Sec-critical and sec-high are assigned to vulnerabilities that can be triggered through ordinary user behaviour, such as browsing to a web page. Sec-critical is reserved for issues that are publicly disclosed or known to be exploited in the wild. Sec-moderate applies to vulnerabilities that would otherwise be rated sec-high but require unusual steps from the victim. Sec-low covers issues that are far from causing user harm.

Of the 271 bugs: 180 were sec-high, 80 were sec-moderate, and 11 were sec-low.

Mozilla is explicit that sec-high does not mean a practical exploit exists. Firefox has a defence-in-depth architecture. A JIT bug, for example, achieves remote code execution only within a sandboxed, site-specific process. Real attackers generally need to chain multiple exploits across sandbox layers and OS-level mitigations such as ASLR to achieve meaningful control. Mozilla classifies sec-high based on crash symptoms - use-after-free, out-of-bounds memory access - reported by AddressSanitizer, and treats any of them as potentially exploitable given sufficient attacker effort. The organisation does not typically build working exploits to verify exploitability, which allows it to focus resources on finding and fixing additional vulnerabilities.

What comes next for Mozilla's pipeline

Mozilla describes the current scanning posture as largely focused on specific files and functions, selected through a combination of human judgement and automated signals. The next planned step is integrating the analysis into the continuous integration system to scan patches as they land in the codebase. According to Mozilla, "models are quite flexible with the form of context provided, and we expect patch-based scanning to work as well or even better than file-based scanning."

Mozilla is candid that the team has not exhausted the latent bug supply in Firefox. The volume of work required to manage 271 bugs - in addition to the broader 423-fix release cycle - involved long days and significant team effort across engineering, triage, testing, and release management.

Firefox has been undergoing rapid changes across multiple dimensions in 2026, including a new CEO appointed in December 2025, centralized AI controls introduced in Firefox 148, and the launch of Firefox 149 with a built-in VPN. The security pipeline described by Mozilla represents a parallel track: hardening the browser's underlying code while its surface features are also changing quickly. Mozilla's advertising business has also been expanding, adding programmatic partners and building out a privacy-first ad stack - all of which depends on Firefox remaining a credible and secure platform.

For the broader software industry, agentic AI systems are moving rapidly from experimental to operational. The Mozilla case is one of the most detailed public accounts yet of what a production-grade agentic security pipeline actually involves - not just the model, but the parallelisation infrastructure, the deduplication logic, the bug lifecycle tooling, and the human review layer required to make it function at scale. Mozilla's decision to publish a detailed sample of the bug reports, despite its normal practice of keeping such reports private for months after shipping fixes, was described as a "calculated decision" driven by the "extraordinary level of interest in this topic and the urgency of action needed throughout the software ecosystem."

Timeline

Late 2025 - Anthropic observes Claude Opus 4.5 approaching full solution of CyberGym, a benchmark testing LLM ability to reproduce known security vulnerabilities, prompting work toward a harder evaluation
February 2026 - Anthropic's Frontier Red Team uses Claude Opus 4.6 to find 22 vulnerabilities in Firefox 148 over two weeks; 14 rated high-severity, representing nearly a fifth of all high-severity Firefox bugs fixed in 2025
February 2026 - Mozilla introduces centralized AI controls in Firefox 148, allowing users to block generative AI features or manage them individually
March 6, 2026 - Anthropic publicly announces the Firefox 148 vulnerability collaboration with Mozilla
March 17, 2026 - Mozilla publishes the Firefox 149 roadmap announcing a free built-in VPN, split-screen browsing, and the Smart Window AI assistant
April 7, 2026 - Anthropic announces Claude Mythos Preview and Project Glasswing, giving approximately 50 named organisations plus around 40 additional groups access to the model for defensive security; Anthropic commits up to $100 million in usage credits
April 21, 2026 - Firefox 150 ships with 271 bugs attributed to Claude Mythos Preview patched, including CVE-2026-6784 (154 bugs), CVE-2026-6785 (55 bugs), and CVE-2026-6786 (107 bugs); total April release cycle fixes 423 security bugs
May 7, 2026 - Mozilla publishes the full technical post-mortem on the Claude Mythos Preview engagement at Mozilla Hacks, including a sample of 12 previously private bug reports across different browser subsystems

Summary

Who: Mozilla engineers Brian Grinstead, Christian Holler, and Frederik Braun, working with Anthropic's Claude Mythos Preview model and Frontier Red Team, with contributions from over 100 Mozilla engineers on patching, testing, and release management.

What: Mozilla built and deployed an agentic AI security harness that used Claude Mythos Preview to identify 271 previously unknown security vulnerabilities in Firefox, including bugs ranging from 15 to 20 years old that had survived extensive conventional testing. The wider April 2026 release cycle fixed 423 security bugs in total, including 41 from external reporters and 111 found through other internal means.

When: The engagement with Claude Mythos Preview preceded the Firefox 150 release on April 21, 2026. Mozilla published the detailed technical account of the process on May 7, 2026.

Where: The pipeline ran across Mozilla's existing fuzzing infrastructure, parallelised across multiple ephemeral virtual machines. Findings were triaged and patched within Mozilla's standard security bug lifecycle. The bugs were fixed in Firefox 150 and subsequent point releases.

Why: Mozilla pursued the project because agentic AI models had reached a capability level where they could find real, reproducible bugs in complex codebases that fuzzing and manual inspection had missed for years - including sandbox escapes, use-after-free vulnerabilities, and race conditions across multiprocess browser architecture. The urgency of publishing details, despite Mozilla's standard practice of keeping bug reports private for months, was driven by what the engineers described as the need for action across the software ecosystem.