Pliny the Liberator, the pseudonymous AI researcher who's built a reputation for exposing guardrail vulnerabilities in commercial language models, claims to have jailbroken Anthropic's Claude Fable 5 just 48 hours after its Tuesday launch. The safety-tuned model was positioned as a constrained version of Mythosβthe more powerful system Anthropic deemed "too dangerous to release widely"βbut Pliny says his crew found the holes before anyone could even finish their victory laps.
How They Broke Through Fable's Walls
"Despite this overly sensitive, authoritarian 'safety' layer on top of Mythos, my lil liberators have been hard at work... cleverly finding the holes in the fence that the thought police missed," Pliny wrote. The technique he's calling decomposition-recomposition proved most effective: requests get sliced into harmless-looking fragments, each one innocent enough to pass filters individually. When reassembled by a jailbroken Claude Opus 4.8 acting as a backend intermediary, those pieces unlock capabilities Anthropic tried to lock away.
The Crypto Community's Worst Fears Confirmed
The jailbreak timing couldn't be worse for the blockchain ecosystem. Crypto users had already voiced concerns when Claude Fable 5 and Mythos launched earlier this year, worried these models could power attacks on DeFi protocols and smart contracts. A fully liberated Fable 5 means that threat just got a lot more realβand a lot closer.
Industry-Wide Backlash Over Heavy-Handed Restrictions
"This is one of the first times that an AI company has rolled out a guardrail, and there has been uniform disdain. It has led to a lot of justified anger," said Sayash Kapoor, an AI researcher at Princeton University, speaking to the Wall Street Journal. Fable 5's design redirects users asking about bioweapons or cybersecurity topics to earlier, less capable modelsβa blunt instrument that blocks legitimate researchers alongside malicious actors.
Anthropic's Bug Bounty Came Up Empty
During the launch announcement, Anthropic emphasized its external bug bounty program had produced "no universal jailbreaks in over 1,000 hours of testing." That claim now looks shaky. Cointelegraph reached out to Anthropic for comment but received no immediate response. "The consensus seems to be that this has been one of the most disappointing model drops of all time, effectively preventing legitimate researchers from contributing their talents to our collective advancement," Pliny said. For those watching the AI safety space closely, that's the real tragedy hereβAnthropic built a fortress and called it safety, while the research community lost access to tools that could've advanced everyone's understanding.
Key Takeaways
- Pliny jailbroke Claude Fable 5 using decomposition-recomposition: breaking requests into harmless fragments before reassembly via jailbroken Opus 4.8
- Anthropic's heavy restrictions have drawn criticism from researchers who say the model blocks legitimate work alongside harmful use cases
- The crypto community faces increased risk as a fully liberated Fable 5 could power protocol attacks and smart contract exploits
The Bottom Line
Anthropic's approach with Fable 5 proves that building walls around AI doesn't make it saferβit just pushes the interesting work underground where nobody can see what's being built. When safety measures anger researchers more than they stop bad actors, something fundamental has broken in how these companies handle release decisions.