Commonplace Brings Privacy-Tiered, Self-Hosted Memory to AI Agents

A developer going by itsmeduncan has published Commonplace, an open-source project that adds structured, privacy-tiered long-term memory to AI agents running on your own hardware. The system is built around Graphiti's knowledge graph architecture and exposes itself as a pair of MCP (Model Context Protocol) servers — one for personal data, one for client-confidential material — that Claude Code, Pi, or any other MCP-capable agent can read from and write to over a private Tailscale tailnet.

Two Tiers, Zero Data Leaving the Box

The architecture splits memory into two isolated graphs inside a single FalkorDB instance: commonplace_personal and commonplace_client. By default, entity and relationship extraction runs locally on your GPU via Ollama serving mistral:7b-instruct-q4_0 in Q4 quantization — no API keys required, nothing touches an external model unless you explicitly opt in. The client tier is permanently local by design; it is meant for NDA-covered material that must never leave the machine. The personal tier defaults to local extraction but can be pointed at Claude Haiku 4.5 running on Anthropic's hosted API for higher-quality graphs on non-confidential data — a one-line .env change. Both tiers share a single Ollama instance serving nomic-embed-text (768-dim) for embeddings and the shared Mistral model for extraction. Search retrieval is fully offline: it combines embeddings, BM25, and graph traversal with no LLM in the query path whatsoever. The GPU only ever handles slow, asynchronous background extraction — query latency is never affected by local model speed.

Networking Over Tailscale Only

The gateway (Caddy) binds two ports on the host: :8000 for personal, :8001 for client-confidential. Both are served exclusively over the tailnet via MagicDNS at your-server.your-tailnet.ts.net — not exposed to the public internet. Every request requires Authorization: Bearer with a per-tier token, so a client holding only the client token cannot reach the personal tier. FalkorDB (:6379) and the Prometheus metrics endpoint (:9180) bind to 127.0.0.1 only, completely invisible even on the tailnet.

Setup Requires Docker, Ollama, Tailscale, and One GPU

The host needs Docker Compose v2, a consumer NVIDIA GPU with roughly 8 GB VRAM (CPU-only works but extraction is slow), and Tailscale running to serve endpoints over the tailnet. No GPU passthrough into containers — Ollama runs on the host and the MCP containers reach it via http://host.docker.internal:11434/v1. On each client laptop, you need Tailscale plus an MCP-capable agent like Claude Code or Pi. The README includes a detailed gotchas section covering current (2026) Graphiti MCP server quirks — things that contradict older docs and will bite you if you don't read them first. Highlights: use provider: "openai" not "openai_generic" to reach Ollama; the Anthropic tier requires an explicit numeric llm.temperature or it queues forever; the :standalone image ships without the anthropic SDK, so you must pip install it in your Dockerfile before provider: anthropic will start.

Adding a New Client Takes Three Commands

Any device on the tailnet can point its MCP client at the same two endpoints — there is nothing per-client on the server. For Claude Code: run two claude mcp add commands with the bearer header for each tier. The graphs and auth are shared across all clients; reads and writes land in the same two graphs regardless of which device they come from.

Key Takeaways

Two isolated FalkorDB graphs (personal / client-confidential) share one deployment but stay firewalled from each other via separate bearer tokens
Local extraction by default means zero API key exposure for both tiers — the personal tier opts into Claude Haiku, the client tier never does
Query path is LLM-free: embeddings + BM25 + graph traversal run entirely on-premises without touching a model at runtime
The :standalone Graphiti image bundles its own FalkorDB by default; Commonplace uses it with an external DB so both instances share one instance instead of two

The Bottom Line

Commonplace is the kind of project that makes you wonder why vendors are charging for agent memory when this runs on a $300 used GPU and a Raspberry Pi. The dual-tier design is elegant — not because it's novel, but because it enforces a hard boundary between your notes and your clients' secrets without requiring any behavioral discipline from the agents themselves. If you're running Claude Code in professional contexts and you care even slightly about data residency, this is where you start.

> Commonplace Brings Privacy-Tiered, Self-Hosted Memory to AI Agents