Antti Innanen, a law firm founder, just dropped Lavern on Hacker News—a multi-agent legal AI system built over six months and released under Apache 2.0. The project isn't being sold as a product; it's positioned as "a collection of ideas in one repo" and a polished demo of how legal workflows could be automated differently. At its core: 67 agent prompts (59 specialists, 7 workflow orchestrators, and 1 base orchestrator) that coordinate through a formal debate protocol, with three independent layers of verification before any finding reaches a human gate.
The Multi-Agent Architecture
Every agent is a specialized system prompt running on the same underlying frontier LLM—Claude via Anthropic (US), Mistral AI (EU), or locally via Ollama. They're not separate models; they're separate roles with different MCP tool permissions and slots in the debate protocol. The real work isn't the prompts themselves, it's the four structural layers wrapped around them: a citation-first debate system where agents must quote specific text from parsed documents to make findings stick, three-layer verification (evaluator gate → adversarial red team/blue team debates → 10-pass pipeline), human approval gates before critical decisions auto-deliver, and a Precedent Board that persists memory across engagements—tentative patterns get promoted to confirmed once they recur with consistent verdicts. The system reads PDF, DOCX, MD, and TXT documents, sanitizes them, then routes findings through the verification stack.
Verification in Depth
The 10-pass verification pipeline (src/workflows/templates/verification.ts) checks context, UX, clarity, structure, accuracy, completeness, risk, formatting, legal_design, and delivery. Each layer fails closed independently—if any gate rejects a finding, it doesn't move forward. Separate from this pipeline, the grounding verifier in src/mcp/tools/grounding-verifier.ts cross-checks every cited quote against the parsed document via string matching. The adversarial debate mode puts agents in red team and blue team roles that must cite competing evidence. Innanen is upfront about the imperfections: "Agents sometimes don't listen to each other. Sometimes one dominates. Sometimes they swing to the opposite extreme when challenged, not because the challenge was stronger but because it was newer." That's hacker honesty—documenting failure modes instead of hiding them.
EU Compliance and Provider Flexibility
Lavern ships with full EU mode via Mistral AI routed through Paris. Set LAVERN_PROVIDER=mistral and most of the pipeline routes to Mistral instead of Anthropic. There's one documented gap: the Lavern Challenge route (src/api/routes/challenge.ts) still instantiates Anthropic directly even when Mistral is configured—so strict EU data boundary folks should avoid that feature for now. The claw start --ethical flag enforces Mistral-only with conservative risk posture across the rest of the pipeline. Five bundled legal datasets come pre-loaded: CUAD, MAUD, ACORD, UNFAIR-ToS, and LEDGAR (each under its own license documented in NOTICE). ContractNLI was removed because its CC BY-NC-SA 4.0 license conflicted with Apache 2.0.
Clawern Autonomous Mode
Beyond interactive sessions where users brief the system via dashboard, Lavern includes Clawern—an autonomous daemon that processes documents on a 30-minute heartbeat. It accumulates a precedent board across reviews, pushes findings to Telegram, email, or macOS notifications, and includes weekly digests, multi-client isolation, audit trails, cost forecasting, and hybrid local-plus-frontier processing. Commands like claw init, claw validate, claw start/pause/resume manage the daemon lifecycle. Innanen calls it "28 modules of watching, planning, processing, delivering, precedent tracking, auditing"—it's a substantial autonomous workflow engine sitting in the same repo.
What's NOT Stress-Tested
This is where Innanen's transparency stands out. The engineering works: 1,677 tests pass across 105 files, tsc --noEmit is clean on backend and frontend, the pipeline runs end-to-end. What hasn't been independently validated is whether all this machinery produces materially better outputs than a well-prompted single LLM on real legal work. No public benchmark exists—internal evaluation only. Other known gaps: no vector retrieval (just BM25 full-text via SQLite FTS5), no durable task queue (in-process event bus means server restarts kill mid-engagement work), Counsel deliveries take 5–10 minutes due to non-streaming assembly calls, and "67 agents is probably more than needed—we started with about ten, just kept adding."
Key Takeaways
- Lavern v0.15.0 runs locally without API keys in demo mode (LOCAL MODE default)
- Three inference providers: Anthropic Claude (US), Mistral AI (EU), or Ollama for full local deployment
- 21 MCP tools handle debate, scoring, verification, grounding, memory, knowledge base, and quality checks
- Human approval gates pause critical findings before delivery—the system doesn't auto-escalate without oversight
- The project explicitly frames itself as inspiration, not a production-ready product—take the legal-quality claims as hypotheses
The Bottom Line
Lavern is ambitious architecture wrapped in honest caveats. Innanen built something that works at the engineering level and then admitted it hasn't proven its core thesis: that multi-agent debate with three verification layers actually beats a well-prompted single LLM on real legal tasks. That's exactly the kind of transparency this space needs more of. Fork it, stress-test the quality claims, build on whichever pieces interest you—just don't mistake the demo for the destination.