A new Rust-based project called Letheo is challenging the assumption that AI agents need traditional databases for memory management. Rather than storing and querying data, Letheo operates as what its creator calls a Cognitive Runtime—an organism that 'perceives, dreams, evokes, and fades.' The project tackles one of the thorniest problems in multi-agent systems: how to manage unbounded history without blowing through token budgets or resorting to expensive re-summarization at every step.

Strategic Forgetting as a Feature

The core insight driving Letheo is that naive memory approaches break down when an agent's history grows. Cramming the entire past into prompts creates unbounded cost, LLM-based summarization runs in O(N) time per operation, and traditional RAG retrieves point facts while remaining blind to temporal changes—what changed over time. Letheo takes a radically different path: it distills behavior into fixed-size structures readable at constant cost, whether the history contains 4,000 or 1,000,000 events. Memory weight decays according to physics—specifically, temporal entropy. The formula governing this is weight(t) = salience · e^(−λ · Δt) · (1 + reinforcement), where λ = ln2 / halflife and Δt measures time since the last evocation or reinforcement. Recalling something resets its decay clock, granting earned permanence—but reinforcement has diminishing returns and every memory has a floor. Nothing becomes immortal no matter how often it's revisited.

Two-Layer Memory Architecture

Letheo implements what it calls Complementary Learning Systems across two distinct layers. Layer 2 (semantic/archetype) captures the subject's identity and trajectory, decomposed into behavioral modes rather than blind averages. Each mode has its own forgetting physics and drift—the degree to which that behavior has shifted since creation. This layer compresses at O(1) cost regardless of history length. Layer 1 (episodic/factstore) handles verbatim facts with embeddings, semantic deduplication, and selective forgetting. It answers the exact nominal questions that Layer 2 would never store. The unified EVOKE operation answers both character-level gist AND nominal facts in a single operation, splitting one token budget across both layers.

Mnemonic Query Language

Rather than SELECT/INSERT/UPDATE/DELETE, Letheo exposes biological verbs through MQL: PERCEIVE takes raw stimuli into volatile short-term memory (born decaying), DISTILL collapses perceptions into an Intention Vector with multi-modal compression—what the project calls 'dreaming,' EVOKE recalls by semantic resonance within a token budget with RESONATING WITH for trait-focused retrieval, FADE handles strategic forgetting while preserving contributions to the archetype, IMPRINT anchors an archetype against decay, RECALL performs directed retrieval of exact facts at Layer 1, and REINFORCE implements spaced repetition to reset fact decay.

Architecture and Status

The engine is built as a modular Rust workspace with separate crates for core physics (letheo-core), inference providers via Candle (letheo-inference), the MQL lexer and executor (letheo-mql + letheo-exec), ANN indexing with HNSW life-filtering (letheo-index), async runtime via Tokio, persistence using JSON plus embedded redb store, threshold calibration, and a CLI REPL. Python bindings use PyO3 with a high-level SDK exposing Session, prose support, and tiktoken integration. CandleProvider loads all-MiniLM-L6-v2 locally from disk—local-first with no runtime downloads. The Rust workspace runs completely offline without the model; only the Python binding requires it. Current status shows 144 tests passing with zero failures across the workspace.

Key Takeaways

  • Letheo treats agent memory as a living organism, not a database—strategic forgetting is intentional by design
  • Entropy-based decay means constant-time retrieval regardless of history size (4K or 1M events)
  • Two-layer architecture separates episodic facts from semantic identity/trajectory
  • Biological MQL verbs replace SQL operations—no SELECT/INSERT/UPDATE/DELETE
  • Fully offline Rust engine with Python SDK via PyO3; runs locally without cloud dependencies

The Bottom Line

Letheo is a serious architectural bet that the future of agent memory isn't about storing more—it's about forgetting smarter. If you're building multi-agent systems and currently duct-taping vector databases together, this approach deserves your attention. The physics-based decay model elegantly solves token budget problems that have been plaguing production agent deployments.