Backboard Development Studio dropped its R-CLI into open beta today, and the pitch is straightforward: build memory-first, not model-first. The result? An open-source model running through their harness posted 70% on Terminal Bench 2.1—numbers that go toe-to-toe with Claude Code at up to 90% lower cost. That is not a promise. Those are internal test run numbers published in the announcement. The install command is two lines, and there is a promo code for inference credit at the bottom.

The Memory-First Thesis

Everyone is racing to wrap the next model. Backboard did the opposite. They built the memory layer first, then routing, tool-calling, and now a recursive engine—treating the model itself as a swappable component. Their memory algorithms currently rank #1 on LoCoMo and LongMemEval benchmarks, powering the R-CLI through Backboard's unified API that handles memory, routing across 17,000+ models, RAG, and stateful threads behind one key. The hypothesis: give a system real persistence, real routing, and real recall, and a smaller model will outwork a smarter one that forgets everything between turns.

By The Numbers

The internal test runs this week show 92% on Terminal Bench 2.1 running Codex 5.5. Running GLM 5.1—an open-source model—through the same harness hits 70%. Read that second line again. An open model, inside their coding harness, posting numbers competitive with closed frontier systems. They are also claiming up to 30% fewer tokens and up to 90% lower cost compared to closed harnesses. And critically: zero percent of user code is used to train anyone's model—something the team explicitly calls out as worth reading the fine print on competing tools.

/expert Mode: Two Brains, One Task

The feature that stands out is /expert mode. Developers are not locked into a single model choice for an entire task. They can plan with Opus 4.7 and execute with DeepSeek V4 in one workflow—the expensive model architects, the fast cheap one ships, and the harness orchestrates the handoff. Frontier reasoning where it counts, frontier-beating cost where it does not. One command. This is only possible because Backboard built routing first instead of bolting it on as an afterthought.

Stateful By Default

The R-CLI achieves persistence natively—not as a layer you bolt on. Session-priming files, weekly cron jobs auditing agent drift, pre-commit hooks keeping things on rails? All default behavior in the terminal. When a developer showed up in the comments with a well-tooled local repo, custom RAG, skills, memory, and knowledge graph he had invested months building, his initial verdict was "not super helpful for a setup like mine." Then he learned about stateful-by-default. The conversation flipped from rejection to booking a demo call with their CLI lead within the same thread.

Key Takeaways

  • R-CLI achieves 70% on Terminal Bench 2.1 using GLM 5.1, an open-source model—competitive with Claude Code at a fraction of the cost
  • Memory algorithms rank #1 on LoCoMo and LongMemEval benchmarks; persistence is native behavior, not bolted-on
  • /expert mode lets developers use two models in one task: plan with frontier reasoning, ship with cost efficiency
  • Unified API spans 17,000+ models behind one key; user code never trains anyone else's model

The Bottom Line

This is the kind of bet that either ages like a prophecy or gets forgotten in a month. But here is what matters for builders: Backboard identified a real pain point—developers hand-maintaining what should be default behavior—and built around it. The CLI lands today, the IDE comes next. Memory-first or model-first, come test it yourself.