Less than a year ago, the Model Context Protocol (MCP) was everyone's favorite buzzword—every vendor and platform scrambled to integrate it into their workflows. Then, almost overnight, Agent Skills stole the spotlight. "MCP is dead" started trending across social media, echoing the same hyperbolic cycle we've seen with SaaS announcements before.

The Real Problem: Context Bloating

The criticism isn't unfounded. MCP's context-bloating issue became a serious pain point for developers working with complex schemas. One ZenStack user ran into a wall when trying to expose even a single tool from their database—the generated JSON schema hit around 410,000 tokens while the available limit sat at roughly 130,000. The culprit? ZenStack exposes ORM query APIs that allow nested relation traversals across your entire database in a single call. This powerful capability means one function can pull data from dozens of interconnected models, but it also bloats the schema beyond what LLMs can reasonably process. This isn't unique to ZenStack, either. The official Playwright GitHub documentation explicitly notes that modern coding agents favor CLI-based workflows exposed as Skills over MCP because "CLI invocations are more token-efficient: they avoid loading large tool schemas and verbose accessibility trees into the model context."

Cloudflare's Code Mode Solution

Cloudflare faced a similar challenge with their API, which has over 2,500 endpoints. Exposing each one as an MCP tool would consume over 2 million tokens. Their solution? Instead of describing every operation as a separate tool, let the LLM write code directly—hence "Code Mode." They collapsed everything into just two tools: search and execute. The result was a reduction in input tokens by 99.9%, down to roughly 1,000 tokens.

ZenStack's Three-Tool Approach

ZenStack took this concept further with three specialized tools tailored for database queries: schema — Instead of a search tool, it sends the entire schema to the LLM, giving it a complete picture of your application. While the footprint isn't fixed like Cloudflare's approach, most modern LLMs can handle schema files comfortably. check — Validates query function calls before execution by performing TypeScript type checking. This catches invalid field names or wrong argument shapes early, keeping error messages clean and actionable for downstream processing. execute — Runs validated queries against the database using fixed function calls like findMany, createMany, and updateMany. The key insight is that LLMs are already familiar with Prisma-style query syntax from their training data, so they can write correct queries without extensive prompting. The combination of check plus execute adheres to software engineering's "high cohesion, low coupling" principle—execute only receives valid function calls, meaning any errors it returns are genuine runtime issues the user needs to address.

Testing With Claude

ZenStack tested this setup with a simulated gym application containing over 50 models and AI-generated mock data using Claude Desktop with Sonnet 4.6. The results were impressive: Claude generated complex nested queries involving more than 10 models, produced accurate results, and knew when to chain multiple operations instead of forcing everything into one call. The check tool caught only a single incorrect function call throughout the entire testing process—and even that was recoverable since Claude simply took an alternative route.

The Bottom Line

Code Mode isn't about replacing MCP—it's about using it smarter. By letting LLMs work with familiar query syntax rather than wrestling with bloated schemas, you get the security benefits of MCP without the context overhead. If you're building AI-powered database tools, this pattern is worth exploring. ZenStack's sample implementation on GitHub gives you a working starting point to experiment with.