Developer Drops 90-Line Three-Tier Memory Recall System for AI Agents

Knowledge-and-Memory-Management v0.0.2 dropped on DEV.to this week, and it's doing something refreshingly practical in a space drowning in over-engineered RAG frameworks. The project, built by developer mage0535, positions itself not as another retrieval-augmented generation toolkit but as the actual foundation layer that makes AI agents remember things worth remembering.

Why Three Tiers Beat One Vector Store

The core insight here is architectural: most AI agent memory systems are single-layer vector databases pretending to handle everything. KMM argues—and I tend to agree—that short-term context and long-term knowledge have fundamentally different retrieval patterns. Hot memory (current session) needs speed, Warm memory (semantic vectors across 10K nodes) needs intent understanding through synonyms, and Cold memory (knowledge graph with 11K pages of structured relationships) needs entity cross-linking that pure similarity search can't deliver. The system chains these three engines into a single call pipeline. When you query the recall function, it hits FTS5 full-text search first—returning results in under 3 milliseconds. Only if L1 doesn't satisfy your limit does it fall back to Hindsight's semantic vectors, and only then to gbrain's knowledge graph traversal. The result: 85% of queries resolve entirely at layer one. Vectors aren't dead, but they're definitely being oversold as the universal answer.

Ninety-Five Lines That Don't Need Your Dependencies

Here's where it gets hacker-friendly. The entire three-tier recall logic fits in lightweight_recall.py at under a hundred lines with zero external dependencies beyond what's already in your stack: python def recall(query: str, limit: int = 10): results = [] # L1: FTS5 full-text search (fastest, sub-ms) results += fts5_search(query, limit) if len(results) >= limit: return results[:limit] # L2: Hindsight semantic vectors (intent-aware) results += hindsight_search(query, limit - len(results)) if len(results) >= limit: return results[:limit] # L3: gbrain knowledge graph (entity linking) results += gbrain_search(query, limit - len(results)) return results That's it. No LangChain abstractions, no vendor lock-in with a vector DB startup, just SQLite's FTS5 doing what it's always done best—returning matches fast when you know what you're looking for.

What v0.0.2 Actually Ships

Beyond the core architecture, version 0.0.2 brings collection firepower that covers real workflows: 40+ ingestion tools spanning nine web engines, twelve video processing options, nine document/OCR pipelines, and ten content sources. It also integrates with twelve cloud drive providers—OneDrive, Google Drive, Aliyun, Baidu Netdisk—all routed through rclone's unified interface so you're not rewriting connectors when your team switches storage backends. The book refinement pipeline is particularly clever: PDF/EPUB files get cached automatically and trigger book_to_skill, which chunks technical books into structured Skills plus annotations. Over 710 books are indexed in the library already, with on-demand caching instead of mandatory full downloads. The SenseNova document engine handles both text-based and scanned PDFs, extracts PowerPoint slides completely, and parses Word documents including tables and highlights—all through a single command: python3 sensenova_dispatcher.py pdf report.pdf. A Sunday automation job scans OneDrive for new local notes and ingests them into gbrain automatically. The maintenance script handles orphaned pages and index compression during the overnight window. This is the kind of boring-but-critical automation that makes or breaks whether developers actually use a tool day-to-day.

Key Takeaways

Three-tier Hot-Warm-Cold memory architecture replaces single-vector-store thinking with purpose-built retrieval layers
95-line lightweight_recall.py achieves sub-3ms L1 response times without external dependencies beyond SQLite FTS5
40+ collection tools cover web, video, documents, OCR, and content sources across multiple platforms
Rclone integration unifies 12+ cloud drive backends under one interface
Book-to-Skill pipeline automates technical documentation ingestion with structured outputs
Weekly maintenance automation keeps the knowledge base fresh without manual intervention

The Bottom Line

This is what happens when someone actually uses an AI agent in production instead of just benchmarking vector search on synthetic datasets. KMM proves you don't need a constellation of microservices to give agents decent memory—you need layered retrieval that respects different access patterns and a collection pipeline that doesn't require a PhD to configure. Bookmark the GitHub repo: github.com/mage0535/Knowledge-and-Memory-Management

> Developer Drops 90-Line Three-Tier Memory Recall System for AI Agents

Why Three Tiers Beat One Vector Store

Ninety-Five Lines That Don't Need Your Dependencies

What v0.0.2 Actually Ships

Key Takeaways

The Bottom Line

> RELATED DISPATCHES