Ask any developer who's run a long coding session with an AI agent what happens at turn 80 versus turn 10. The same model, the same prompt, the same task—somehow it's regressed. It re-suggests fixes you already rejected. It "forgets" files it edited twenty turns ago. It contradicts decisions you made together at the start. According to Maneshwar's deep-dive on DEV.to, this isn't a model failure. It's the desk filling up.
The Context Window Is RAM, Not a Hard Drive
In Part 1 of his context engineering series (which this article builds upon), Maneshwar makes the case that AI context windows are working memory with a hard edge—not infinite storage you can just keep piling into. Everything a model reasons about has to fit in active attention. The long conversation degradation happens when you keep adding paper to the desk and never clear any of it. Eventually, the critical constraint ("never touch the auth module") is buried under four hundred lines of tool output, and your agent is confidently doing exactly what you told it not to do.
Compaction Is Deliberate Forgetting
The solution isn't just summarizing the conversation. Maneshwar identifies compaction as lossy compression for meaning—replacing a long history with a shorter representation that preserves what matters and discards what doesn't. The "lossy" part is the whole game. If it were lossless, you'd just have the same tokens in a different font. Two things compaction isn't: clearing (amnesia, wiping history entirely) and trimming (mechanical removal of oldest messages by hard rule). Compaction sits between them—smarter than trimming, less destructive than clearing.
Why Naive Summarization Fails
Drew Breunig's taxonomy of long context decay applies here. Sloppy compaction introduces poisoning (a hallucination makes it into the summary and becomes load-bearing), distraction (the summary recreates original bloat), confusion (superfluous detail pulls toward irrelevant work), and clash (summary and live messages disagree). But the scariest failure mode is cumulative erosion—each lossy compaction compounds. Compact a compression five times across a marathon session, and "never touch the auth module" becomes three generations of paraphrase away from the original constraint. The agent on the other side of that auto-compact is working from worse notes. It's not just reduced context—it's behavioral discontinuity.
What Actually Survives the Cut
The heuristic: keep decisions and state, discard process. A good compaction keeps "the user's table is documents_v2." It discards the 400 lines of JSON waded through to find that fact. The specific directive to preserve includes current file states, explicit user constraints (quoted exactly—these are the most expensive things to lose), in-progress work, critical references like IDs and paths, and next steps. What gets dropped: deliberation trails, raw tool output already digested, completed-and-verified subtasks, pleasantries, retries, intermediate states overwritten by later changes.
How Production Agents Handle It
Claude Code leans on automation—auto-compaction fires at roughly 95% context usage with manual /compact available for steering. Community consensus suggests this is too late; degradation happens before that threshold. OpenAI's Codex CLI takes a "handoff" approach, framing compaction as a checkpoint producing a summary for another LLM to resume the task. It keeps recent user messages verbatim alongside the summary and implements retry-with-backoff when the summarization call itself fails (because it is an LLM call, and those fail).
If You're Building Your Own
Maneshwar's synthesis: trigger earlier than you think—85-90% rather than waiting for 95%. Prune stale tool output mechanically before calling expensive lossy summarization. Keep recent turns verbatim while compacting the distant past. Pin constraints as exact text across every compaction and tell the user it happened. A silent compaction that changes behavior feels like random regression; one acknowledgment line ("compacted history to free up context") makes it a tradeoff instead of a mystery.
Key Takeaways
- Context windows are working memory with hard limits, not infinite storage
- Compaction is deliberate lossy compression—not just summarization
- Each compaction round erodes meaning; trigger earlier than 85% if possible
- User constraints must survive every pass unchanged—quote them exactly
- The best tokens to send are the ones you didn't have to send
The Bottom Line
Maneshwar's framing cuts clean: external memory and retrieval let you dodge hard choices about what matters, but inside a single long-running task on finite hardware, something has to go. Compaction is choosing what survives on purpose instead of letting your context window quietly shove the one constraint that actually mattered off the edge of attention. Build your agents smart enough to code—and disciplined enough to forget.