Bad MCP Design Can Cost Your Agent 5× More Tokens in Real-World Tests

A developer posted detailed benchmark results on Hacker News this week revealing that poorly designed MCP (Model Context Protocol) servers can inflate token consumption by nearly 5× compared to well-optimized alternatives running the same workloads. The experiment compared two MCP implementations serving a to-do list application—one built by the developer (MCP-A) and an official server released later by the app itself (MCP-B)—both hitting identical backend APIs with reset accounts between test runs.

The Test Setup

Forty prompts simulating typical use cases were executed against both servers using the same model, system prompt, and Agent framework. Despite identical pass rates of 90% (36 out of 40 successful completions), the performance gap was staggering: MCP-B consumed 3,174,329 total input tokens versus just 637,244 for MCP-A—a difference of nearly five times. Output token usage showed a smaller but still significant 1.34× gap, and MCP-B required 35 additional ReAct loop iterations to complete the same tasks.

Root Cause: Tool Design That Forces Extra Round-Trips

Examining execution logs revealed the bottleneck stemmed from poor query tool design rather than backend performance. In MCP-B, the search_tool returns task ID, title, and URL—but omits project_id needed for subsequent CRUD operations. This forces the Agent to make a second call to get_task_by_id before proceeding. MCP-A's query_tasks, by contrast, returns all context required for the next action in a single response: task ID, title, project ID, start date, priority, and status.

Raw API Dumps Bloat Context Windows

Beyond round-trip issues, MCP-B dumps raw API responses directly into the Agent's context window. The create_task tool example shows over 600 characters of unfiltered JSON including createdTime, modifiedTime, focusSummaries, and other fields irrelevant to the task at hand. These extraneous bytes accumulate rapidly across multi-step workflows. MCP-A applies filtering and formatting before returning results—preserving only what matters for downstream actions.

Tool Count Matters

The analysis also highlighted how tool proliferation impacts decision complexity. MCP-A consolidated 47 tools down to just 14 while maintaining full functionality, reducing the candidate set the model must evaluate during action selection. More tools mean larger search spaces and higher cognitive overhead for the Agent's decision-making process.

Key Takeaways

Design tools with downstream requirements in mind—anticipate what context the next step needs rather than answering only the immediate query
Minimize tool counts to reduce decision burden; merge overlapping functionality into unified interfaces
Format returned data for LLM readability by filtering unnecessary API fields and restructuring JSON before it hits the context window

The Bottom Line

If you're shipping MCP servers without thinking about token efficiency, you're not just creating worse developer experiences—you're potentially multiplying your inference costs fivefold at scale. This is the kind of insider knowledge that separates hobbyist integrations from production-grade agentic systems. Bookmark MCP-Eval (github.com/Code-MonkeyZhang/mcp-eval) and test before you deploy.

> Bad MCP Design Can Cost Your Agent 5× More Tokens in Real-World Tests