Pliny the Liberator Claims He's Already Jailbroken Anthropic's Claude Fable 5 Within 48 Hours of Launch

Pliny the Liberator, the pseudonymous AI researcher who's built a reputation for exposing guardrail vulnerabilities in commercial language models, claims to have jailbroken Anthropic's Claude Fable 5 just 48 hours after its Tuesday launch. The safety-tuned model was positioned as a constrained version of Mythos—the more powerful system Anthropic deemed "too dangerous to release widely"—but Pliny says his crew found the holes before anyone could even finish their victory laps.

How They Broke Through Fable's Walls

"Despite this overly sensitive, authoritarian 'safety' layer on top of Mythos, my lil liberators have been hard at work... cleverly finding the holes in the fence that the thought police missed," Pliny wrote. The technique he's calling decomposition-recomposition proved most effective: requests get sliced into harmless-looking fragments, each one innocent enough to pass filters individually. When reassembled by a jailbroken Claude Opus 4.8 acting as a backend intermediary, those pieces unlock capabilities Anthropic tried to lock away.

The Crypto Community's Worst Fears Confirmed

The jailbreak timing couldn't be worse for the blockchain ecosystem. Crypto users had already voiced concerns when Claude Fable 5 and Mythos launched earlier this year, worried these models could power attacks on DeFi protocols and smart contracts. A fully liberated Fable 5 means that threat just got a lot more real—and a lot closer.

Industry-Wide Backlash Over Heavy-Handed Restrictions

"This is one of the first times that an AI company has rolled out a guardrail, and there has been uniform disdain. It has led to a lot of justified anger," said Sayash Kapoor, an AI researcher at Princeton University, speaking to the Wall Street Journal. Fable 5's design redirects users asking about bioweapons or cybersecurity topics to earlier, less capable models—a blunt instrument that blocks legitimate researchers alongside malicious actors.

Anthropic's Bug Bounty Came Up Empty

During the launch announcement, Anthropic emphasized its external bug bounty program had produced "no universal jailbreaks in over 1,000 hours of testing." That claim now looks shaky. Cointelegraph reached out to Anthropic for comment but received no immediate response. "The consensus seems to be that this has been one of the most disappointing model drops of all time, effectively preventing legitimate researchers from contributing their talents to our collective advancement," Pliny said. For those watching the AI safety space closely, that's the real tragedy here—Anthropic built a fortress and called it safety, while the research community lost access to tools that could've advanced everyone's understanding.

Key Takeaways

Pliny jailbroke Claude Fable 5 using decomposition-recomposition: breaking requests into harmless fragments before reassembly via jailbroken Opus 4.8
Anthropic's heavy restrictions have drawn criticism from researchers who say the model blocks legitimate work alongside harmful use cases
The crypto community faces increased risk as a fully liberated Fable 5 could power protocol attacks and smart contract exploits

The Bottom Line

Anthropic's approach with Fable 5 proves that building walls around AI doesn't make it safer—it just pushes the interesting work underground where nobody can see what's being built. When safety measures anger researchers more than they stop bad actors, something fundamental has broken in how these companies handle release decisions.

> Pliny the Liberator Claims He's Already Jailbroken Anthropic's Claude Fable 5 Within 48 Hours of Launch