Set cacheRetention to "long" in your Opus model params, delegate all implementation work to Sonnet/Haiku sub-agents, and trim your workspace files. These three changes cut our per-task costs from $0.83 to $0.17.
We run a fleet of OpenClaw agents coordinated by a single Opus 4.6 instance. It handles Discord conversations, delegates work to cheaper sub-agents, and manages cron jobs across six machines. Last week it burned through $200 in four days. Here's how we diagnosed the problem and fixed it.
Where the Money Goes
Anthropic's pricing has four tiers that matter for OpenClaw operators: - Input tokens: $5/MTok — your system prompt, conversation history, tool results - Output tokens: $25/MTok — what the model generates - Cache write: $6.25/MTok — first time a prompt block enters the cache - Cache read: $0.625/MTok — subsequent hits on cached content The critical insight: cache writes cost 10x more than cache reads. Every time the prompt cache expires and rebuilds, you're paying full price again for content the model already saw. OpenClaw injects workspace files (system prompt and memory) on every API call. Ours totaled about 35,000 tokens. When the cache is warm, that's $0.02 per turn. When it rebuilds, that's $0.22. In one session, we counted seven cache rebuilds totaling $1.41 — 68% of the session cost was just re-caching the system prompt.
The Seven Fixes
Fix 1: Disable Heartbeat
OpenClaw's heartbeat pings the agent at regular intervals to keep it alive. Ours was set to every 15 minutes during the day, 30 minutes at night. Each ping is an Opus API call — even if there's nothing to do.
"heartbeat": { "every": "0" }We also switched all cron jobs from wakeMode: "next-heartbeat" to "now" so they fire on their own schedule instead of waiting for a heartbeat that no longer exists.
Fix 2: Lock Down Discord
Our agent was responding to every Discord DM. Random messages, bot pings, and test messages all triggered Opus. We locked it to one user ID:
"dm": {
"enabled": true,
"policy": "allowlist",
"allowFrom": ["YOUR_DISCORD_USER_ID"]
}Fix 3: Aggressive Session Reset
Stale sessions accumulate conversation history, making each turn more expensive. We cut the idle timeout from 60 to 30 minutes:
"session": {
"reset": {
"mode": "idle",
"idleMinutes": 30
}
}Fix 4: Enable Long Cache Retention
This was the single biggest win. Anthropic's default prompt cache TTL is 5 minutes. In a real conversation, users often take 4-7 minutes between messages — just long enough for the cache to expire and force a full rebuild at $0.22 per turn. OpenClaw supports a cacheRetention parameter that extends this to 1 hour:
"anthropic/claude-opus-4-6": {
"alias": "opus",
"params": {
"cacheRetention": "long"
}
}You also need to align the context pruning TTL so the gateway doesn't prune content it thinks expired:
"contextPruning": {
"mode": "cache-ttl",
"ttl": "60m",
"keepLastAssistants": 2,
"minPrunableToolChars": 5000
}Before this change, we saw $0.22 cache rebuild spikes every time a user paused for a few minutes. After: zero rebuilds across an entire session.
Fix 5: Trim Your Workspace Files
Workspace files get injected into every API call. Ours were bloated — 12KB and 29KB respectively. We trimmed them to 4.4KB and 5.6KB by moving reference data to an external database and removing sections the agent didn't need in its system prompt. Every kilobyte you trim saves tokens on every single turn. At Opus pricing, a 20KB reduction saves roughly $0.10 per API call. Over 100 calls a day, that's $10/day or $300/month.
Fix 6: Delegate to Cheaper Models
Our Opus agent was doing everything — reading files, writing code, running tests. Now it follows a strict coordinator pattern: 1. Opus talks to the user and makes decisions (2-3 turns) 2. Sonnet/Haiku sub-agents do the actual work (6-10 turns at 5-25x lower cost) 3. Sub-agent results come back as a single tool response The key system prompt instruction that enforces this:
> *Every tool call you make costs money on Opus 4.6. Do NOT read files to explore before delegating. Tell the sub-agent WHERE the files are and let THEM read it on a cheaper model.* We also downgraded all cron jobs from Opus to Fleet Sonnet or Fleet Haiku. A cron job that checks for new issues doesn't need Opus-level reasoning.
Fix 7: Silence Cron Delivery
Cron job results were being "announced" back into the main agent's session, adding tokens to its context and sometimes triggering follow-up Opus responses. We changed delivery to write results to a shared filesystem instead:
"delivery": { "mode": "none" }Results now go to ~/.openclaw/workspace/shared/results/ where the coordinator checks them only when asked.
The Results
We tested three identical deployment tasks before and after optimization: | Task | Before | After | Cache Rebuilds | |------|--------|-------|----------------| | Tool integration A | $0.57 | — | 2 | | Tool integration B | $0.60 | — | 2 | | Tool build + deploy (post-fix) | — | $0.17 | 0 | | Baseline task | $0.83 | — | 8 exploration turns | The first task after all seven fixes cost $0.17 with zero cache rebuilds. The actual delegation (spawning a Sonnet sub-agent, waiting for results, reporting back) cost about $0.14. The sub-agent did all the work for $0.09 on Fleet Sonnet. Total session cost dropped from $2.08 (three tasks with cache rebuilds) to $0.17 (one task, clean). Extrapolating to similar workloads: roughly 80% cost reduction.
The Takeaway
If you're running OpenClaw with Opus, check these three things first — they account for 90% of our savings: 1. Is cacheRetention set to "long"? If not, you're paying 10x for system prompt caching every time there's a 5-minute gap in conversation. 2. Is your agent doing work, or coordinating work? Every file read, every code write on Opus costs 5-25x what it would on Sonnet or Haiku. 3. How big are your workspace files? Trim your system prompt and memory files to the minimum. Move reference data to a structured database or external storage. The rest — heartbeat, Discord lockdown, cron delivery, session timeout — are smaller wins that add up. But cache retention alone took us from $0.22 per cache rebuild to $0.02. That's the one change every OpenClaw operator should make today.