If you've ever tried handing a large codebase to an AI assistant, you know the pain: token limits, repeated context loading, and bills that climb faster than your code complexity. Developer KrishivPiduri just released repo-brain, an open-source tool designed to solve exactly this problem by compressing entire repositories into lean markdown files that work with any LLM provider.
How Repo-Brain Works
The tool takes a multi-layered approach to repository analysis. First, it performs static analysis using Tree-sitter AST parsing for Python, JavaScript, TypeScript, Go, and Rust—languages where syntax trees give the most accurate results. For everything else—Java, Ruby, C#, C/C++, Swift, Kotlin, Shell—the tool falls back to regex pattern matching that still gets the job done without requiring a full parser. Where repo-brain gets clever is in its architecture analysis phase. Instead of just dumping code structure, it makes a single LLM call to identify layers, components, entry points, and data flow patterns within your codebase. This means you're not just seeing what files exist—you're getting a map of how everything connects.
Semantic Relationships: The Secret Sauce
Beyond static analysis, repo-brain discovers semantic relationships between code elements. It uses the LLM to identify producer/consumer links between functions and modules, shared data structures that multiple components depend on, parallel implementations doing similar work in different places, and polyglot bridges where your codebase crosses language boundaries.
Multi-Provider Support
The tool ships with built-in support for a wide range of LLM providers: OpenAI, Claude, Deepseek, Gemini, Groq, Ollama, Mistral, xAI, Perplexity, and OpenRouter. This flexibility means you can generate your context file once and experiment with different models to see which one best understands your codebase's architecture.
Installation Is refreshingly Simple
Repo-brain avoids the usual dependency headaches with one-liner installers that require zero manual venv setup or configuration. On Mac or Linux, a single curl command handles everything. Windows users get equivalent PowerShell support through the Invoke-RestMethod cmdlet. The release package includes main.py, llm.py, ingest.py, analyze.py, relationships.py, generate_prompt.py, mcp_server.py, config.example.py, and requirements.txt—all open source and ready to tinker with.
Key Takeaways
- Repo-brain achieved 96% compression on a 262-file repository (154,229 tokens down to 6,487)
- Tree-sitter parsing for Python/JS/TS/Go/Rust; regex fallback for other languages
- Architecture analysis via single LLM call identifies layers and data flow
- Works with OpenAI, Claude, Deepseek, Gemini, Groq, Ollama, Mistral, xAI, Perplexity, and OpenRouter
The Bottom Line
This is exactly the kind of tool the developer community needs right now—something that makes AI-assisted development more accessible without requiring a PhD in prompt engineering. If you're working with large codebases and finding yourself constantly fighting token limits, repo-brain is worth bookmarking.