Developer Builds Approval Layer to Stop AI Agents From Going Rogue

Every AI agent demo on your timeline right now is one feature flag away from the next disaster. That's not a prediction—it's an architectural observation, and developer k08200 has the war stories to prove it.

The Problem With Autonomous Agents

Last month, an agent the author was dogfooding cancelled a calendar event they actually cared about—two weeks before that, a different one auto-replied to an investor with what read like "a hostage note from a Slack bot." Both companies have raised more money than most developers will see in five years. The pattern is always identical: agent does a thing, emails the user that it did the thing, the thing was wrong, company ships a fix the following Tuesday. Rinse, repeat, trust erodes.

Klorn: An Approval Layer for AI Agents

The solution? Stop letting agents act autonomously on external systems like Gmail and Calendar. Klorn intercepts the gap between an agent's decision and its execution—the agent does all the thinking: reads emails, checks calendars, drafts replies, creates event proposals. Then it stops. Nothing fires until you click approve. "Sounds boring," the author admits. "The constraint is what makes it real."

How It Works: Payload Hashes and Invariant Tests

Every meaningful action in Klorn is signed with a payload hash before execution—send_email literally cannot fire without an ActionReceipt matching what was shown to you. Here's the clever part: there's an invariant test that fails the build if anyone tries to bypass it, including future contributors or even AI agents working on the codebase. Remove the approval check, the test fails, the build fails, the deploy fails. You cannot ship a silent-send version of Klorn. It's architecturally impossible.

What Shipped This Week

The agent loop now runs end-to-end: meeting requests hit the inbox and get tier-classified (PUSH / QUEUE / SILENT / AUTO), then Klorn reads the email, checks calendars for conflicts, drafts replies and event proposals—all held as PendingActions until one click fires everything. The author also shipped a critical production fix when OpenRouter retired a :free model SKU mid-week, killing every autonomous cycle with "404 No endpoints found." Existing failover only covered 402/403/429—not "the model is gone," so they built a multi-model fallback chain on the same provider.

Key Takeaways

- Can users prove what was approved matches what was sent? If not, you have a problem. - Can future contributors bypass your approval check? Invariant tests are non-negotiable. - What's your failover strategy when upstream dependencies disappear mid-week? If your answers are "no," "yes," and "we don't have one"—you're building the next apology email. Stop and add the gate first.

The Bottom Line

Autonomous AI agents without approval layers aren't intelligent systems—they're liability machines waiting to embarrass you in front of investors, clients, or your actual calendar. Klorn proves there's a better architecture: let the agent think, make humans approve, sign everything so you can audit it later. Boring? Absolutely. Trustworthy? That's the whole point.

> Developer Builds Approval Layer to Stop AI Agents From Going Rogue