Study: 70% of AI-Generated Bugs Fall Into Just 21 Categories

The AI coding boom has a dirty secret nobody's talking about. A new analysis from Detail, an automated code review platform, examined 1,000 bugs across 99 customer codebases—and found that roughly 70% of them clustered into just 21 distinct failure patterns. That's not random noise in the system; that's a signal.

The Methodology Behind the Numbers

Detail pulled bugs flagged by their system and subsequently fixed by customers over the past two and a half months, capping contributions at 5% per company to avoid skewing results toward any single organization's codebase. They rewrote each bug into language-agnostic, company-free descriptions of its failure mechanism, then embedded those descriptions using both OpenAI's text-embedding-3-large and Google's gemini-embedding-001 models. Dimensionality reduction via UMAP followed by HDBSCAN clustering revealed the underlying structure—21 mechanisms describing most failures.

Broken Authorization Dominates

The largest cluster wasn't a clever race condition or some exotic memory corruption—it was broken authorization and access control. Missing auth checks, improper scope enforcement, and credential mismatches topped the list of recurring mistakes across diverse products. These aren't subtle implementation details; they're bona fide security vulnerabilities that ship silently to production because they don't crash anything. A missing authorization check returns a clean HTTP 200 while handing User A someone else's data.

The Old Guard Is Overhyped

Here's the kicker: null pointer exceptions—famously called the "billion-dollar mistake" by Tony Hoare—accounted for only 1.7% of the corpus. TypeScript and modern type systems have largely neutered this class of bug, yet it still dominates developer mental models of what can go wrong.

Authyness vs. Race-Condition-Ness

Detail mapped these clusters along two axes: "authyness" (do bugs involve improper user access?) and "race-condition-ness" (do they require specific timing or sequencing to surface?). The resulting spectrum reveals why these bugs persist—they're invisible during normal QA because most testing is single-player, smoke-testing features in isolation rather than probing permission boundaries or temporal edge cases.

Why These Bugs Slip Through

Think about what has to happen for one of these bugs to reach production: it must evade the engineer writing it, slip past any AI co-pilot generating code, survive linters and tests, and escape code review. That's a gauntlet—yet they still ship because they're not crashes; they're silent behavioral deviations that only manifest under specific conditions.

Making Illegible Bugs Legible

Detail's prescription is two-fold: architect systems so bugs fail loudly instead of silently (their example is "denial by default"—where missing authorization returns 403 rather than allowing access), and build agents specifically trained to detect these patterns. The first step is knowing what to look for—and that's now quantified in black and white.

Key Takeaways

Broken authorization/access control is the dominant bug category, not crashes or null pointers
Race conditions, incomplete state updates, and silent failures account for most production bugs
Null pointer exceptions represent only 1.7% of issues—modern type systems are winning that fight
"Deny by default" architecture makes missing auth checks fail loudly where tests can catch them

The Bottom Line

This data should reframe how we think about AI-generated code quality. We're not dealing with random entropy—we're repeating the same 21 mistakes at scale, and until our tooling specifically targets these patterns, they'll keep shipping. If you're vibecoding without automated auth testing in your pipeline, you're shipping vulnerabilities.

> Study: 70% of AI-Generated Bugs Fall Into Just 21 Categories