If you're running heavy agent workloads on Claude, you've probably noticed that Anthropic's 5-minute cache window is... optimistic about human attention spans. One missed refresh and suddenly you're paying full price to rewrite the entire prefix. I hit this problem hard enough that I finally did the math, and the answer is cleaner than expected.

The Core Formula: W/R = 62.5 Minutes

The author at skids.dev worked through Anthropic's pricing structure and landed on a deceptively simple rule: refresh your cache if you expect to need it within 62.5 minutes; let it expire if you'll be gone longer. The math comes down to the ratio between write cost (1.25x base input) and read/refresh cost (0.10x base). On Opus 4.7, a 100K-token prefix costs $0.625 to write but only $0.05 to refresh. Each refresh buys you another 5-minute window at that cheap read rate, so the break-even is where you've refreshed often enough that cumulative reads equal one extra write: T = 5 × (1.25/0.10) = 62.5 minutes.

Why Model Choice Doesn't Matter (But Costs Do)

Here's the interesting part: this 62.5-minute number stays constant regardless of model tier or prefix size. Both strategies scale with base input price and token count, so when you divide write cost by refresh cost to find the break-even point, all that cancels out. A 5K-token Sonnet prefix and a 500K-token Opus prefix both hit crossover at exactly 62.5 minutes—but your actual bill will be very different. At 30 minutes idle on a 500K Opus prefix, refreshing saves $1.625 compared to rewriting. At 90 minutes, refreshing has become the wrong choice and costs you an extra $1.375.

Compaction Math: The Compression Ratio Trap

The article also tackles context compaction—the summarization trick agents use when approaching context limits. Claude Code, OpenCode, and most autonomous agents do this automatically at certain thresholds. The break-even formula here is (1 + 62.5r)/(1 - r), where r is the compression ratio. At 20:1 compression, you recover costs in about 4 future turns. At 10:1, you're looking at roughly 8 turns to justify it. But here's the trap: summary tokens are output tokens, and output pricing is 5x base input. A verbose or low-quality summarization can actually cost more than just eating the larger context.

The Gotchas Anthropic Doesn't Tell You

Opus 4.7 introduced a new tokenizer that can inflate token counts by up to 35%. If you're moving cached prompts between model versions, don't assume your old token estimates still apply—run everything through their counting endpoint first. There's also a minimum cache floor (4,096 tokens for Opus, 1,024 for Sonnet), and the API won't warn you if you're under it; you just silently lose caching. The lookback window is limited to 20 content blocks, so long-running agents can accidentally push their target cache entry out of range.

Key Takeaways

  • Refresh your cache if you'll return within 62.5 minutes; let it expire otherwise—this number is model and size independent
  • Compaction pays off at roughly 10:1 compression ratio with ~8 future turns expected, but verbose summaries can negate the savings
  • Watch for Opus 4.7's new tokenizer inflating token counts by up to 35% when moving cached prompts between models
  • Check cache_creation_input_tokens and cache_read_input_tokens in your usage block—if they're zero, your cache isn't actually caching

The Bottom Line

This is exactly the kind of under-the-hood analysis that separates casual API users from people actually optimizing their inference spend. The 62.5-minute rule is elegant precisely because it's universal—Anthropic's pricing multipliers are consistent enough that you can make this decision without running a spreadsheet every time. If you're building serious agent workflows and not thinking about cache TTL strategy, you're probably leaving money on the table.