Bito's AI Architect Slashes Claude Code Token Costs by Nearly Half

Bito just dropped something interesting for anyone running coding agents at scale. Their AI Architect feature cuts Claude Code's token cost by 47% on SWE-Bench Pro benchmarks, and that's not some cherry-picked scenario — it's aggregate across substantial multi-file tasks in production open-source codebases.

Why Coding Agents Burn Cash

Here's the dirty secret nobody talks about: exploration is what makes coding agents expensive. When an agent picks up a task in an unfamiliar codebase, it has to build a mental model first. Without a map, that means brute force — listing directories, grepping for symbols, opening files, following imports. Each step does two costly things. First, another round trip to the model generates more reasoning and output tokens. Second, more content gets dumped into the conversation. The agent re-reads the entire transcript on every later turn, so a 50 KB file opened on step 8 is still being re-processed on step 80. The compounding effect is brutal. A long agent run costs grow faster than linearly because context keeps ballooning. On harder tasks there's the search spiral — agents grinding through 40 to 90 steps of dead-end searches, re-reading the same large files multiple times, producing busywork like summary docs and verification scripts. Pure token burn that never lands the fix.

How AI Architect Fixes This

AI Architect continuously indexes every repository and exposes that index over MCP (Model Context Protocol). When an agent starts a task, it consults the index and receives a compact structured briefing covering architecture and major frameworks, component and module breakdown with locations, file and directory layout, and dependency relationships between modules. During the run, the agent calls back for targeted queries — symbol references across the codebase, exact code at specific locations, conventions new files must follow. Armed with that briefing upfront, the agent skips the discovery phase entirely. It knows where to look, opens only the files that actually matter, and starts working. The search spiral never starts. That compact map costs a small fixed amount of context up front — a fraction of the repeated ad hoc exploration it replaces.

Evaluation Results

Bito ran this the right way: same coding agent (Claude Code), same engineering tasks from real open-source projects (Flipt, Teleport, and Tutanota web clients), with identical settings in both arms. The only variable was whether AI Architect's index was available over MCP. Token usage dropped across every category that grows with exploration — context re-reads fell 48% from 58.2M to 30.2M tokens, new context written dropped 42%, and agent-generated output tokens fell 48%. The breakdown of where savings come from tells the story: two-thirds (66%) is the compounding effect of shorter runs carrying leaner transcripts that get re-processed far fewer times. The rest splits between less new content written into context (22%) and fewer output tokens generated (11%). Reasoning steps dropped from roughly 75 to 30 — a 60% reduction. Tool calls fell 49%, with file reads dropping 62% from 28.3 to 10.7 per task, shell commands falling 52% from 36.2 to 17.5, and code edits dropping 48%. Perhaps most striking: task success on SWE-Bench Pro climbed from 51.9% to 70.1% with Claude Opus 4.6. More tasks resolved, at lower cost.

Standout Example: Flipt Audit Configuration Task

One specific Flipt task — adding audit configuration reporting to a service's anonymous telemetry — shows exactly what this looks like in practice. Without AI Architect, the agent spawned exploration sub-agents and started globbing and grepping across 79 steps, running roughly 25 file reads and 40 shell commands. It re-read the same files multiple times and hunted for a struct that the task itself was meant to create. First code edit landed on step 80. With AI Architect? Three targeted searches. Three files read. First code edit on step 14. Same tests passing. The only difference: one configuration had a map, and the other had to draw it from scratch while burning 60-plus steps doing it.

Key Takeaways

Token cost drops 47% in aggregate, peaking at 68% on individual tasks — savings scale with deployment size
Task success climbs from 52% to 70% on SWE-Bench Pro without changing the model or prompts
Exploration overhead is the dominant cost on substantial engineering tasks; AI Architect removes most of it
Trivial adoption: runs as an MCP server, no model changes or workflow surgery required

The Bottom Line

This is exactly the kind of infrastructure-level optimization that compounds hard when you're running coding agents across a team. Forty-seven percent lower spend while resolving more tasks isn't a marginal improvement — it's the difference between agents being expensive prototypes and economically viable workhorses. If you're not thinking about context architecture for your agent deployments, you are leaving real money on the table.

> Bito's AI Architect Slashes Claude Code Token Costs by Nearly Half