Engram Brings Shared Memory to AI Coding Tools with Offline MCP Server

A new open-source project called Engram is positioning itself as the shared memory layer your AI coding tools have been missing. Dropped on GitHub by developer dgr8akki, Engram runs entirely offline as an MCP (Model Context Protocol) server, giving Claude Code, Cursor, Antigravity CLI, and Windsurf a common semantic knowledge base to read from and write to during development sessions.

What Makes This Different From Just Taking Notes

The pitch isn't just "save your thoughts"—it's composability across AI agents. Instead of each tool maintaining its own context window, Engram lets you build up a shared corpus of institutional knowledge: why you chose a particular pattern, which libraries caused problems, team conventions that never got documented. When you switch from Cursor to Claude Code mid-project, you're not starting cold.

How the Multi-Tool Setup Works

Running ./engram install auto-detects and configures all supported tools in one shot. For Claude Code, it registers an MCP server at ~/.claude/settings.json, installs a skill definition that teaches LLMs how to query Engram, and hooks into session lifecycle events: SessionStart injects the last 15 memories into context automatically, UserPromptSubmit triggers saves when you say "remember:", and Stop archives the session endpoint. The same pattern applies across Cursor (~/.cursor/mcp.json), Antigravity CLI (~/.gemini/config/mcp_config.json), and Windsurf (~/.codeium/windsurf/mcp.json) with their respective hook event names.

Under the Hood: Embeddings Without Calling Home

Engram stores everything in SQLite via sqlite-vec for vector search. By default, it uses sentence-transformers (specifically all-MiniLM-L6-v2 for 384-dimensional embeddings) running on CPU—fully offline, no API keys required. If you've already got Ollama deployed, you can switch backends to use nomic-embed-text at 768 dimensions with a single config change: set backend: "ollama" and point it at http://localhost:11434. No model downloading if Ollama's already running.

HTTP REST Fallback for Non-MCP Tools

Not every tool speaks MCP yet. For those, Engram ships an HTTP server on port 7823 with standard CRUD endpoints: GET /thoughts?limit=20 to list recent entries, POST /thoughts with JSON body to save new memories, GET /thoughts/search?q=... for semantic queries, DELETE /thoughts/{id} to remove stale notes. The autostart scripts handle LaunchAgent registration on macOS so the server's always available without manual invocation.

Key Takeaways

Local-first: all embeddings and storage stay on your machine—no cloud sync, no API egress costs
Multi-tool support out of the box with auto-detection for Claude Code, Cursor, Antigravity CLI, Windsurf
Dual embedding backends (sentence-transformers/Ollama) configurable via config.yaml
HTTP REST server provides a fallback interface for tools lacking MCP client implementations

The Bottom Line

This is exactly the kind of infrastructure the local-first AI tooling movement needs—Engram doesn't just store notes, it creates a shared semantic layer that makes your entire AI-assisted workflow smarter over time. If you've been burned by cloud-dependent memory solutions or just want your coding context to actually persist across sessions without manual prompting every time, this is worth a weekend afternoon of setup.

> Engram Brings Shared Memory to AI Coding Tools with Offline MCP Server