ExploitGym Benchmark Tests Whether AI Agents Can Weaponize Vulnerabilities

A team of researchers from institutions including Carnegie Mellon, UC Santa Barbara, and Google has published a benchmark that directly tests what happens when you give AI agents a bug and ask it to make an exploit. The paper introduces ExploitGym—898 real-world vulnerabilities across three high-stakes targets: userspace programs, Google's V8 JavaScript engine, and the Linux kernel. The question isn't academic anymore.

What Makes This Different

Previous security benchmarks have measured AI's ability to find bugs or write patches. ExploitGym goes further—it tasks agents with taking a known vulnerability trigger and progressively turning it into working exploit code that achieves concrete outcomes like unauthorized file access or arbitrary code execution. The benchmark varies security protections on each instance, isolating their impact. All environments are containerized for reproducibility. The scope is deliberately brutal. "Exploitation requires low-level program reasoning about memory layout, runtime adaptation, and sustained progress over long horizons," the paper states. That's not something LLMs have traditionally excelled at—but the numbers suggest that gap is closing fast.

The Numbers That Should Concern You

Anthropic's Claude Mythus Preview generated working exploits for 157 of the 898 instances. OpenAI's GPT-5.5 managed 120. These aren't edge cases—the benchmark includes production vulnerabilities from real projects, and models retained "non-trivial" success rates even with widely-deployed defenses enabled. The authors acknowledge the dual-use implications upfront: "Supporting defensive workflows while lowering the barrier for offense." But they argue the diagnostic value justifies publication. If you're running a red team or building automated vulnerability assessment pipelines, ExploitGym is your new reality check.

Why This Matters Now

We've spent years debating whether AI will make cybersecurity better or worse. The answer appears to be both—and faster than predicted. An agent that can reliably chain CVEs into working exploits changes the economics of offensive operations dramatically. It also means automated patch validation and exploit generation for defenders could become standard tooling.

Key Takeaways

ExploitGym contains 898 real-world vulnerabilities across V8, Linux kernel, and userspace programs
Claude Mythus Preview (Anthropic) achieved 157 exploits; GPT-5.5 (OpenAI) managed 120
Models retained exploit success even with standard security protections enabled
The dual-use nature of the research cuts both ways for defenders and attackers alike

The Bottom Line

This benchmark is a mirror held up to where AI exploitation capabilities actually are—not science fiction, not vaporware. If you're serious about defense, you need to be running your own variants of these tests against your infrastructure yesterday. The offensive bar just got lower.

> ExploitGym Benchmark Tests Whether AI Agents Can Weaponize Vulnerabilities

What Makes This Different

The Numbers That Should Concern You

Why This Matters Now

Key Takeaways

The Bottom Line

> RELATED DISPATCHES