New MCP Server Indexes Linux Kernel In 3 Minutes, Drops Token Usage By 99%

DeusData just dropped a serious piece of infrastructure for anyone running AI coding agents in production. The codebase-memory-mcp server, posted to Hacker News on June 19, claims to be the fastest code intelligence engine available—and based on those benchmark numbers, this isn't marketing fluff. It full-indexes an average repository in milliseconds and handles the Linux kernel's 28 million lines of code across 75,000 files in just three minutes flat.

Raw Performance Numbers That Actually Matter

The indexing pipeline is RAM-first with LZ4 compression, in-memory SQLite, and fused Aho-Corasick pattern matching. Memory gets released back to the OS after the single dump completes. For structural queries like trace_path or search_graph, you're looking at sub-10ms response times. The Cypher-like graph traversal? Under 1 millisecond. On an Apple M3 Pro, Django indexes in roughly six seconds and produces nearly 50,000 nodes with 196,000 edges. Dead code detection scans the full graph in about 150 milliseconds. This is the kind of speed that makes MCP tooling actually usable in real agent workflows rather than a novelty you demo once and never touch again.

Language Coverage And Semantic Resolution

The project bundles 158 tree-sitter grammars directly into the binary—nothing to install, nothing that breaks when your dependency tree shifts. But raw parsing is just table stakes. The interesting part is the hybrid LSP semantic type resolution for eleven languages: Python, TypeScript/JavaScript/JSX/TSX, PHP, C#, Go, C, C++, Java, Kotlin, and Rust. It's a native C implementation that handles parameter binding, return-type inference, generic substitution, JSX component dispatch, JSDoc inference, namespace resolution, trait methods, UFCS—basically the full type-resolution stack you'd get from tsserver/pyright/gopls/Roslyn/JDT/rust-analyzer but without spinning up those servers or maintaining their dependencies.

Eleven Agents, One Install Command

The install script auto-detects Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, and Kiro—configuring MCP entries, instruction files, skills, and pre-tool hooks for each. For Claude Code specifically, a PreToolUse hook intercepts Grep/Glob calls (never Read—gating reads breaks the read-before-edit invariant) and injects structured context from search_graph when search tokens match indexed symbols. The result: your agent gets graph-augmented results alongside its normal search output without any workflow changes on your end.

Token Efficiency And Team Artifacts

Here's where this tool actually pays off in production: five structural queries consumed roughly 3,400 tokens via codebase-memory-mcp versus approximately 412,000 tokens when exploring the same codebase file-by-file. That's a 99.2% reduction in context consumption. The team-shared graph artifact feature lets you commit a zstd-compressed SQLite snapshot (.codebase-memory/graph.db.zst) to your repo so teammates skip reindexing entirely. First clone gets the compressed artifact imported, then incremental indexing fills in local diffs. A .gitattributes line with merge=ours prevents binary artifact conflicts on concurrent edits.

Security Posture Worth Noting

The tool reads your codebase and writes to agent configuration files by design. If that makes you twitchy, full source is available for audit—every release binary gets signed, checksummed, and scanned by 70+ antivirus engines before publication. All processing happens locally; no outbound traffic, no API calls, code never leaves your machine. Found a vulnerability? They want to know about it (see SECURITY.md). For an infrastructure piece with this level of access, the security-first framing is refreshingly direct rather than buried in fine print.

Key Takeaways

Full Linux kernel index: 3 minutes for 28M LOC across 75K files on M3 Pro hardware
Query response times under 1ms for graph traversal; dead code detection at ~150ms full scan
158 bundled tree-sitter grammars, zero external dependencies or API keys required
Supports Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, and Kiro out of the box
Token consumption drops from ~412K to ~3.4K for equivalent structural queries—a 99.2% efficiency gain

The Bottom Line

If you're running any AI coding agent in a serious codebase, you need graph-level code understanding or you're just burning tokens on file-by-file exploration. Codebase-memory-mcp delivers that intelligence layer without adding infrastructure overhead—no Docker containers, no language servers to babysit, no API key management. At zero cost and sub-minute indexing for most projects, this is the kind of tooling that makes you wonder why you've been tolerating the alternative.

> New MCP Server Indexes Linux Kernel In 3 Minutes, Drops Token Usage By 99%