A Four-Pillar Approach to Production AI Agents Zistica released Lumin this week—an open-source operational platform for large language model agents in production, licensed under Apache 2.0. The project covers four functional areas: Observe (universal tracing across 16 integrations), Govern (policy engine with OWASP starter packs), Defend (runtime guardrails including tenant isolation), and Operate (webhooks, backups, Prometheus metrics). Everything ships as a single self-hosted Docker container using DuckDB for storage—no external services required, no telemetry back to any vendor.
Universal Tracing Across Every Major Framework The Observe pillar captures full span trees from agent code written in Python or TypeScript. Lumin supports first-class integrations with LangChain, LangGraph, LlamaIndex, CrewAI, AutoGen, LiteLLM, OpenAI Agents SDK, Pydantic AI, Anthropic (including extended-thinking blocks), Mastra, VoltAgent, and OpenClaw via OTLP receiver. Teams can instrument an agent in two lines of code using the @lumin.trace decorator or lumin.span() context manager—or skip the SDK entirely by pointing any OpenAI-compatible client at Lumin's proxy on port 8000. Cost attribution breaks down per-call for OpenAI, Anthropic, and Ollama providers.
OWASP LLM Top 10 Guardrails at Runtime The Defend pillar runs eight detection methods against every span: Microsoft Presidio NER for PII scrubbing, Prompt Guard 2 (22M-parameter injection classifier), Llama Guard 4 (14 MLCommons hazard categories), LLM-judge scoring, embedding similarity checks, indirect-prompt-injection detection, locally-trainable classifiers with per-corpus retraining, and regex packs. The platform ships twelve pre-built policy starter packs covering OWASP LLM Top 10, OWASP Agentic 2025, GDPR, HIPAA, PCI-DSS, cost guards, cross-session isolation, customer support workflows, code assistant patterns, dev environment safety, and framework-specific rules for LangGraph and Mastra.
The Five-Layer Tenant-Isolation Firewall Lumin's tenant firewall addresses LLM08 (Excessive Agency) and LLM10 (Model Theft / Cross-Tenant Exfiltration) together—a gap most guardrail classifiers leave wide open. In multi-tenant Slack, Telegram, or Discord bots, one customer's data leaking into another's conversation is a real risk when using shared agent sessions. The firewall stacks five structural layers: L1 storage sandbox rewrites every filesystem tool call (read, write, edit, grep) to ${workspace}/_lumin/by-sender//; L1.5 denies shells and egress tools like exec, bash, python, web_fetch, curl by default; L2 resets the conversation history when sender switches, clearing foreign-tenant messages from context entirely; L3 applies Presidio NER plus structural ID regexes and foreign-vault excerpt scanning before every LLM call; L4 records every block, redaction, and sandbox rewrite to policy_violations with webhook fanout to PagerDuty or Slack. A verified Telegram-to-Slack leak attempt demo shows 3 exec calls denied, 2 read calls sandboxed to empty per-sender directories, 97 prior-tenant messages cleared from history at sender switch, and foreign-vault entries redacted from the prompt—with zero foreign data in the bot's reply.
Policy Lifecycle: Shadow Mode to Enforcement The Govern pillar includes a DB-backed policy engine with typed DSL supporting before_proxy_call and after_proxy_call lifecycle hooks. Every rule starts in shadow mode—recording what it would have done without blocking traffic—until operators review dashboard timelines and promote rules individually. The platform versions every policy edit, supports one-click rollback to any prior version, and includes a policy suggester that mines patterns from real traces to propose new rules. A replay feature tests draft policies against historical spans before promotion, showing would-block versus would-allow counts.
Self-Hosted, Local-First, No Vendor Lock-In Lumin's architecture runs FastAPI on port 8000 for the ingest/query API and Next.js standalone on port 3000 for the dashboard—WebSocket fanout pushes new traces in real time. The platform prioritizes resilience: if Lumin is unreachable or returns 5xx errors, agent code continues running without failure; spans drop silently rather than blocking workflow execution. A panic disable kill-switch stops all firewall enforcement instantly with banner and audit trail. The project compares against Langfuse (observability only), Lakera (SaaS guardrails), and NemoGuardrails—Lumin claims to be the only option combining full-trace observability, per-user file sandboxing, conversation history isolation, policy engine with approval queue, and self-hosted single Docker deployment.
Key Takeaways
- Lumin instruments 16 agent frameworks via SDK or OpenAI-compatible proxy—no code changes required for existing projects
- Five-layer tenant firewall blocks cross-session data leaks structurally rather than relying on classifier accuracy alone
- Policy engine starts in shadow mode, versions every edit, and supports rollback—operators control promotion cadence
- Single Docker container with DuckDB storage runs locally without external services or telemetry
- Apache 2.0 license permits commercial use, internal deployment, and hosted service offerings for customers