MCP's Context Hunger and Reliability Woes Have Developers Questioning the Protocol's Future

The Model Context Protocol was supposed to be the great unifier — a standardized way for AI agents to talk to the tools and services they need to get work done. Launched in late 2024 and quickly crowned "the USB-C of the AI ecosystem," MCP earned buy-in from Anthropic, OpenAI, and a growing roster of enterprise tool providers like GitHub, Notion, and Linear. But a devastating new analysis from Quandri Engineering, amplified by a heated Hacker News thread (195 points, 174 comments), has exposed cracks in MCP's foundation. The verdict emerging from the community? MCP devours context windows, suffers from low operational reliability, and duplicates functionality that existing CLI and API tools handle far more efficiently.

Problem 1: Context Window Carnage

The most damning finding from Quandri's analysis is the sheer volume of tokens consumed by MCP tool definitions. In their real-world stack — Linear, Notion, Slack, and Postgres MCP servers — tool definitions alone burned through over 21,000 tokens. That's 10.5% of Claude's 200K context window, and a brutal 16.5% of GPT-4o's smaller 128K context. The architectural problem is stark: every tool definition ships with its full JSON schema — parameters, descriptions, return types — loaded into context whether the model will ever use it or not. Linear alone delivers 42 tool definitions (~12,807 tokens), even if you only ever call get_issue and save_issue. One analogy from Quandri's analysis cuts deep: "You sit down and 10 menus are spread across the table. There's no room left for actual food." To be fair, Claude Code has since rolled out Tool Search with Deferred Loading, which loads MCP tool schemas on-demand and reduces context usage by 85%+. But the underlying architectural concerns persist.

Problem 2: Operational Reliability Nightmares

MCP's reliability issues are harder to dismiss. The Quandri team documented several failure modes that stem directly from the protocol's architecture: init failures requiring repeated re-authentication, slower AI responses due to external server round-trips on every tool call, mid-session crashes where MCP server processes die mid-conversation, and opaque permission systems that leave developers unclear about what each tool can actually access. The performance benchmarks are brutal. When comparing Jira MCP against its REST API directly: MCP was 3× slower per individual call, and a staggering 9.4× slower on first call when accounting for initialization overhead. This isn't a Jira-specific problem — it's architectural. Every MCP server adds an additional process layer between the LLM and the underlying API, creating new failure points that don't exist with direct integration.

Problem 3: Reinventing What Already Works

Perhaps the most fundamental critique cuts to protocol's core purpose: MCP duplicates functionality that already exists and works better. CLI tools offer human-machine parity (humans and LLMs use the same commands), composability through pipes, jq, and grep, immediate reproducibility in any terminal, and training data from man pages, StackOverflow, and GitHub. MCP offers none of these advantages — only tool definitions locked inside LLM conversations with separate server setup, authentication management, and process lifecycle overhead. The token comparison for the same Linear issue lookup is brutal: a CLI approach consumes roughly 200 tokens (50 for the curl command, 150 for the response), while an MCP call burns through approximately 12,957 tokens total — including 12,807 perpetually loaded tool definitions. That's 65× more tokens for identical functionality.

The CLI-First Alternative

The alternative gaining traction is elegantly simple: provide existing CLI tools to LLMs directly. These models already learned from millions of man pages, StackOverflow answers, and GitHub gists. They already know how to construct curl commands, pipe through jq, and grep through results — no new protocol needed. The same Linear issue lookup works with a straightforward authenticated GraphQL call that any Claude Code or Codex session can write without MCP infrastructure. As one HN commenter put it: "MCP feels like solving a problem that doesn't exist — we already had working interfaces between software and software. The breakthrough is that LLMs can now use those same interfaces."

Key Takeaways

MCP tool definitions consume massive context: 77 tools across four servers totaled ~21,077 tokens in production
Performance penalties are architectural: every server adds process overhead and new failure modes
Direct CLI/API access delivers the same results at 65× lower token cost for typical operations
Claude Code's deferred loading shows mitigations exist, but underlying philosophy remains questionable

The Bottom Line

MCP isn't going anywhere fast — the ecosystem investment from Anthropic (which acquired Stainless to accelerate MCP tooling) and major platforms like Linear and Notion is too significant. But for developers building production AI agent workflows today, the choice between protocols has measurable consequences in tokens, latency, and reliability. The pragmatic path forward isn't ideological purity around MCP or CLI-first purism — it's knowing when each approach makes sense and building architectures that can gracefully fall back to direct API calls for high-frequency operations where overhead compounds.

> MCP's Context Hunger and Reliability Woes Have Developers Questioning the Protocol's Future