Claude Code is powerful, but it has a blind spot: internal APIs that never made it into documentation. That's the core insight from developer flaky.build's journey building duckdb-web-archive, a DuckDB extension for querying Wayback Machine and Common Crawl directly via SQL.

The Predicate Pushdown Problem

To make web archive queries efficient, flake needed to implement predicate pushdowns—techniques that translate SQL WHERE clauses and LIMIT statements into upstream API parameters. Without this, every query would fetch thousands of records and filter locally. With it, filtering happens at the source, dramatically reducing bandwidth and latency. The catch? DuckDB's internal C++ APIs for implementing these features aren't documented anywhere. We're talking TableFunctionSet, FunctionData, TableFilterSet, and the entire bind/init/scan lifecycle with filter propagation. All of it lives in the source code of first-party extensions like httpfs and postgres_scanner—nowhere else.

Context7 Hits a Wall

Standard documentation search tools fell short fast. Context7 fetches up-to-date docs for popular libraries, but DuckDB's extension APIs don't have official public documentation. Claude Code kept confidently telling flaky that the desired functionality wasn't possible—even though DuckDB's own extensions were clearly doing exactly this with remote file handling. The AI wasn't wrong to be skeptical of its training data. The information literally didn't exist in any docs it could reference. What it needed was access to actual implementations, not descriptions.

Enter GitHits

A colleague mentioned GitHits, a code search tool that indexes millions of GitHub repositories and exposes results through an MCP server integration. Flaky joined the waitlist, got approved quickly, and enabled the GitHits MCP in Claude Code. The difference was immediate. Instead of generic documentation summaries, Claude could now see real implementations from DuckDB's own extensions—exact function signatures, working patterns, battle-tested code that actually compiled. It went from "this isn't possible" to showing concrete examples of TableFunction::pushdown_complex_filter, using TableFilterSet to extract pushed filters, and wiring up FunctionData for stateful scanning.

Results That Speak for Themselves

The extension now supports full predicate pushdown across Wayback Machine's CDX API and Common Crawl's Index Server. A query that previously fetched thousands of records now makes just six HTTP requests—all filtering pushed to the source. Flaky was so impressed they submitted a PR to duckdb/extension-template adding documentation for building DuckDB extensions with AI assistants. One commenter on the PR noted the guidance is valuable even outside LLM contexts: "The documentation here is super helpful, even without LLM agent considerations."

Key Takeaways

  • Claude Code's knowledge has hard limits when working with undocumented internal APIs
  • Context7-style doc searchers can't help when the info never made it into docs
  • GitHits MCP gives AI assistants access to all of open source as a searchable codebase
  • Real implementations from extensions like httpfs and postgres_scanner contain patterns that documentation never captured

The Bottom Line

If you're building DuckDB extensions—or any library with undocumented internals—don't fight Claude Code when it says something's impossible. It probably just can't see the code. GitHits closes that gap, turning scattered source code into a queryable knowledge base. This is exactly how AI-assisted development should work: not replacing developer intuition, but amplifying it by making invisible patterns suddenly visible.