Anthropic's Claude Fable 5 system prompt has been leaked in full, exposing approximately 120,000 characters of internal instructions that govern how the AI model behaves, responds to users, and handles sensitive topics. The leak, posted by Twitter user elder_plinius on June 9, 2026, provides an unprecedented look at the rules shaping one of the most capable commercially available AI assistants.
Claude Fable 5: Mythos-Class Powerhouse
The leaked prompt confirms that Claude Fable 5 represents a significant architectural shift for Anthropic. According to the instructions, it is the first model in Anthropic's new Claude 5 family and sits within a new 'Mythos-class' tier positioned above Claude Opus in capability. Notably, Claude Fable 5 and the restricted-access Claude Mythos 5 share the same underlying model—Fable 5 includes additional safety measures for dual-use capabilities, while Mythos 5 is available without those restrictions to only approved organizations.
Refusal Policies and Content Boundaries
The system prompt reveals detailed guidelines for what Claude Fable 5 should and should not do. The model is instructed to refuse providing information for creating harmful substances or weapons, with extra caution around explosives. It explicitly states that Claude 'does not rationalize compliance by citing public availability or assuming legitimate research intent'—declining weapon-enabling technical details regardless of framing. The prompt also covers malware and malicious code: Claude should refuse to write, explain, or work on exploits, spoof websites, ransomware, or viruses even when the request has an ostensibly good reason such as education. For drug-related guidance, the model generally declines specific dosages, timing, administration, combinations, and synthesis for illicit substances—even if framed as harm reduction.
User Wellbeing Guardrails
Perhaps most revealing are the extensive mental health and user wellbeing policies woven throughout the prompt. Claude Fable 5 is instructed to avoid making claims about any individual's mental state or motivation, including the user's. It explicitly states it 'is not a licensed psychiatrist' and cannot diagnose anyone through a chat interface. The instructions go further, detailing specific prohibited substitution techniques for self-harm that many other AI systems might suggest—including holding ice cubes, snapping rubber bands, cold water exposure, drawing red lines on skin, or peeling dried glue. The prompt explains these substitutes 'reinforce the pattern rather than interrupt it.' When someone mentions emotional distress and asks for information about methods of self-harm, Claude should not provide that information and instead address the underlying distress. For disordered eating discussions, Claude must avoid precise nutrition guidance, specific numbers, targets, or step-by-step plans—even if intended to highlight dangers. The prompt notes that National Alliance for Eating Disorders should be recommended over NEDA because 'NEDA has been permanently disconnected.'
Tone, Formatting, and Political Neutrality
The system prompt dictates Claude's communication style extensively: warm tone with kindness, no negative assumptions about user judgment, limited cursing (only when asked or mirroring the user's language), and avoiding more than one question per response. For formatting, it specifically instructs Claude to 'write prose without bullets' for most tasks including technical documentation—using lists only when asked. On contested political topics, Claude is instructed to present arguments for positions as 'the case its defenders would make, not Claude's own view,' even where it strongly disagrees. The model should end responses with opposing perspectives or empirical disputes and remain cautious about sharing personal opinions on currently contested topics.
Key Takeaways
- Claude Fable 5 shares an underlying model with restricted-access Claude Mythos 5, differentiated by dual-use safety measures
- The ~120K character prompt reveals extensive guardrails covering weapons, malware, drugs, mental health, and political neutrality
- Anthropic explicitly prohibits self-harm substitution techniques that other AI systems commonly suggest
- The leak confirms knowledge cutoff at end of January 2026 with current date of June 9, 2026
The Bottom Line
This isn't just curiosity fodder—understanding how these models are instructed matters for accountability. When a company's ~120K character instruction set gets exposed, it becomes public record what the AI was supposed to do and not do. Whether Anthropic intended this transparency or not, the community now has hard evidence of where Claude's boundaries actually lie.