ACP Protocol Promises 99% Token Savings by Serving Pre-computed Content Envelopes to AI Agents

Every time an AI agent answers a question about your content, it's probably burning 20,000 tokens parsing nav bars, footers, cookie banners, and sticky newsletter modals to find three sentences of actual answer. That's not a minor inefficiency—it's the whole architecture working against you. The web was built for eyeballs, not agents, and the token tax gets paid on every single query. Now a new open specification called ACP (Atomic Content Protocol) is attempting to fix that shape mismatch at scale, with benchmarks showing up to 99% token reduction on real queries.

The Shape Problem

The core issue isn't that content optimization doesn't help—it does. But no amount of minification or semantic HTML fixes the fundamental problem: agents read everything because there's no channel for "give me just the relevant part." A human skims past twelve language variants in a footer without thinking. An agent pays token-by-token to confirm it's not important. ACP's solution is deceptively simple: pre-compute a compact, enriched envelope that sits in front of your content body and serves it first. The body stays available for when agents actually need it—most of the time they don't.

How ACP Works

The protocol breaks content into discrete "atoms" with stable IDs and rich metadata envelopes containing summaries, classifications, language tags, key entities, and confidence scores. When an agent queries a page, it receives the envelope (~350 tokens for Wikipedia's AI article) instead of scraping the full HTML (~25,000 tokens). The enrichment pipeline runs asynchronously via queue workers—a dirty flag triggers processing out-of-band, so there's no real-time LLM call in the request path. By the time an agent shows up, the envelope is already persisted and waiting. Built on top of MCP (Model Context Protocol), MIT licensed, with npm packages shipped including acp-enricher v0.4.2.

The Numbers

On a single Wikipedia query about artificial intelligence: full body read consumed ~25,000 tokens while the ACO envelope required only ~350—a 99% reduction in token consumption for equivalent information retrieval. Testing across a broader 13-item content set showed consistent results: full bodies totaled ~65,000 tokens versus ~2,800 for envelopes alone, representing an 84-93% reduction depending on document complexity. The three query modes (aco at 619 tokens, full at 3,043 tokens, and both) make the cost difference immediately visible in any application dashboard.

What Remains Unsolved

The authors are refreshingly honest about what ACP doesn't solve: trust. When agents start reading envelopes instead of bodies—which is exactly the efficiency gain they're optimizing for—nothing stops someone from publishing an envelope that says one thing while the underlying content says another. The provenance layer (tool, version, timestamp) makes this manipulation more visible, but visibility isn't the same as prevention. Signed envelopes, verifiable enrichment chains, and reputation layers are all mentioned as partial solutions, each shifting the problem rather than eliminating it.

Building in Public

The team didn't just publish a spec—they rebuilt Stacklist's entire MCP server (Stacky) around ACP envelopes in production. This wasn't academic exercise; it revealed implementation details no spec can capture, like whether an envelope should be one column or its own table (it's the latter). The migration decisions you make when building shape-first versus spec-first are fundamentally different, and that operational knowledge is what separates a protocol from a product.

Key Takeaways

ACP delivers 84-99% token reduction by serving pre-computed content envelopes instead of raw HTML to AI agents
Built on MCP with MIT licensing and npm packages available today for immediate integration
The efficiency gains are real and measurable, but trust/provenance remains an open architectural question
Pre-computing envelopes asynchronously avoids LLM calls in the request path—no latency penalty for enriched responses

The Bottom Line

ACP makes a compelling case that the agent web needs a different shape than what we built for browsers. The numbers aren't marginal—they're the kind of difference where "why isn't everything like this already" becomes the obvious question. But shipping efficiency without solving adversarial enrichment is shipping a surface area for abuse, and the authors know it. Worth watching closely—just don't ignore what's still broken while you're excited about what works.

> ACP Protocol Promises 99% Token Savings by Serving Pre-computed Content Envelopes to AI Agents