A new open-source tool called Orbit landed on Hacker News this week with a straightforward pitch: stop trusting your AI coding agent's word for anything. The project, developed by the team at human-again, positions itself as "mission control" for AI agents—adding structured loops, validation gates, rubric-based scoring, and checkpoint resumability to workflows that currently rely on vibes and crossed fingers.
The Agent Reliability Problem
If you've shipped code with an LLM agent, you've seen it. You ask the agent to fix an auth bug. It responds 'Done!' You ask if tests passed. 'All passing!' Then you run them yourself: red across the board. Orbit's creators call this "drift"—the gap between what the agent claims and what's actually true. The tool's core philosophy is simple: if the agent can't prove its work, the orbit doesn't close. Orbit wraps every agent task in a bounded loop that enforces evidence-based completion. Each cycle selects from a priority-sorted backlog (respecting dependency order), hands the task to your chosen agent adapter, runs real validation commands like pytest or lint checks, scores results against a rubric evaluating task focus and change signal, then either loops for retry or advances to review before writing telemetry artifacts.
Adapter-Based Architecture
Orbit ships adapters for Claude Code, OpenAI Codex CLI, Cursor, and any agent that reads stdin and writes JSON. A MockAgentAdapter enables deterministic local testing without API keys. The adapter contract requires just two methods: prepare_prompt() and run_agent(), returning an AgentResult with status, changed files, notes, metadata, and raw output. This means swapping between agents takes minutes, not hours—critical for teams evaluating different models or running multi-agent handoffs where one agent implements and another reviews.
Real-World Use Cases
The project targets six primary workflows: nightly bug triage against failing CI jobs, incremental feature delivery from a structured backlog.json file, automated code health checks scheduled overnight with cost caps via max_estimated_cost_usd, self-healing repositories where Orbit auto-fixes known flaky tests or broken migrations, dependency upgrades run in order with validation after each bump, and multi-agent pipelines using different adapters for implementation, review, and deployment stages.
Demos and Getting Started
Two replay demos are included: 'auth-rescue' (Orbit catches a real auth bug introduced into the codebase) and 'issue-search' (a scoped search feature validated against tests). Both run with MOCK=1 for deterministic execution without API keys. Setup requires dropping three files in your project—mission.md defining goals, agent-rules.md for coding standards, and backlog.json specifying tasks with acceptance criteria—then launching via orchestrator.py.
Key Takeaways
- Orbit forces agents to prove work through real validation gates before closing loops
- Adapter-based design supports Claude Code, Codex CLI, Cursor, and any JSON-capable agent
- Checkpoint resumability prevents lost progress on long-running missions
- Built-in budget controls cap runs, failures, and estimated cost per session
- Full telemetry artifacts (agent-result.json, evaluation.json, review.json) enable post-mortem analysis
The Bottom Line
Orbit is tackling a real pain point that anyone who's debugged an agent's confident lies can appreciate. Whether the structured loop overhead pays off depends on your workflow scale—but for teams running agents against substantial backlogs, the audit trail and validation gates alone justify the harness. Watch this one.