"100% of our code is written by AI" has become the new flex in startup pitch decks and engineering all-hands meetings alike. It sounds like the future arrived early, ship date moved up, humans replaced. But developer Tommy Jepsen argues this framing obscures a critical distinction that every CTO, security lead, and anyone trusting their data to these systems should be asking about: was there actually a human in the loop?

The Supervised Approach: AI Types, Humans Judge

In the supervised model, an AI drafts code but a human engineer still owns the diff. They read the changes, run required tests, question parts that look suspiciously confident, and take accountability for what gets merged to main. The AI moved the work from typing to judging—but it did not remove the judge entirely. This isn't about engineers reading every line either; it's about understanding the impact of changes in files where a mistake would actually hurt, enough context to grasp what the diff does and maintain mental models of the codebase. Automation can shrink the review surface here. Separate models can critique diffs first, test suites can catch obvious problems before human eyes ever look, linting and static analysis run automatically. But Jepsen emphasizes that this narrows what humans need to review—it doesn't replace the understanding itself. Someone still has to grasp the impact and be accountable for what merges into production systems handling real user data.

The Unsupervised Reality: Prompt to Product

Unsupervised AI coding—colloquially called vibecoding—is the other extreme. You prompt, the model ships, you judge the result by whether it looks right when you click around and test manually. For certain use cases, this instinct is completely sound. Landing pages, simple UI changes, throwaway prototypes where blast radius is small and worst-case scenario is embarrassment rather than data exposure—these are fair game for shipping without deep human review of every generated line. The problem emerges when unsupervised code starts handling authentication flows, payment processing, automated business workflows, or anything touching personally identifiable information. "It works when I click around" doesn't cut it for auth systems. Even with LLM-powered code reviews and passing automated test suites supporting the unsupervised output, there's a fundamental difference between coverage and comprehension—between tests that pass and an engineer who actually understands what the system does under failure conditions.

Are Models Good Enough to Remove Humans?

Jepsen admits he doesn't know. Model capabilities improve fast enough that confident "no" answers age badly, and today's limitations might be tomorrow's footnotes in a release blog post. But even granting hypothetical future models capable of generating production-grade code without errors, there's still the accountability question. Would you feel comfortable handing your data to or basing your entire business on a product whose codebase nobody reviewed? Nobody signed off on?

Key Takeaways

  • Supervised AI coding keeps humans accountable for diffs that matter—this isn't about distrusting models but about deliberate review gates before production
  • Unsupervised vibecoding works fine for low-stakes work like landing pages and prototypes where blast radius is limited
  • For auth, payments, PII handling, and business-critical automation: supervised approaches with human ownership of what merges are non-negotiable
  • Automated tooling (test suites, critique models) narrows review scope but doesn't replace human understanding and accountability

The Bottom Line

The next time a company tells you their code is "100% AI-generated," your follow-up question should be simple: was it reviewed? In an era where AI can generate functional code faster than any human team, the differentiator isn't who's writing the code anymore—it's who owns responsibility for what gets shipped into production systems handling real user data. The humans in the loop aren't legacy overhead to eliminate; they're your last line of defense when models inevitably produce confidently wrong output. Stay sharp out there.