Robert C. Martin recently dropped a framework on X that makes most AI-assisted coding look like glorified autocomplete. His agent pipeline doesn't ask an all-knowing model to "just build the thing." Instead, it runs your vague requirements through six disciplined transformation stages—each more formal than the last, each requiring less human babysitting until the Architect agent sits there running mutation tests while you grab coffee.
The Six-Stage Pipeline
The system transforms informal specs into verified production code through a strict sequence: Informal Specs → Hard Specs (Specifier Agent) → Gherkin Feature Files (Specifier Agent again) → Acceptance Tests + Unit Tests + Implementation (Coder Agent) → Complexity Control + Property Tests (Refactorer Agent) → Mutation Testing (Architect Agent). The key constraint? Each agent only sees output from the previous stage. The Coder never reads your informal notes—it gets Gherkin or nothing. If the Gherkin is ambiguous, it asks for clarification rather than guessing.
Where Humans Actually Matter
Human involvement is front-loaded where judgment counts: you write the initial informal specs, review and correct the hard task list, spot-check the generated Gherkin scenarios. After that? The pipeline runs unattended. You return when the Architect finishes its mutation passes. This isn't about removing humans—it's about putting human attention exactly where it moves the needle and letting CPU cycles handle the rest.
Mutation Testing: Where It Gets Serious
The Architect Agent runs two distinct mutation testing passes, both CPU-intensive by design. First, Stryker.NET mutates the C# source code directly—flipping > to >=, changing + to -, inverting boolean conditions—then fires the full test suite at each mutant. A surviving mutant (one that passes tests despite an injected defect) means a coverage gap. The Architect writes the exact test needed to kill it. Second, Gherkin mutation modifies the feature files themselves—changing values in Given/Then steps, removing scenarios—to verify every scenario actually exercises code. A Gherkin mutation that survives means a step binding is broken or the scenario was never wired up.
Complexity Control and Property Tests
Before mutation testing, the Refactorer Agent enforces cyclomatic complexity below 6 using CRAP (Change Risk Anti-Patterns) metrics via dotnet-crap. A CalculateEffectivePrice method with complexity 8 gets refactored into three separate methods—IsDiscountActive, ApplyDiscount, and the main method—each scoring 3 or below. The Refactorer then adds FsCheck property-based tests that verify invariants across arbitrary inputs, like "effective price is never negative" holding true for all possible basePrice and discountAmount combinations.
Why This Architecture Works
The pipeline treats ambiguity as technical debt. Each transformation stage removes a layer of fuzziness: informal → hard specs eliminates vagueness; hard specs → Gherkin strips implicit context; Gherkin → acceptance tests removes implementation assumptions; unit tests cover untested internals; mutation tests eliminate false confidence. By the time output returns, code is verified at every layer—not because an agent promised correctness, but because the system structurally cannot advance without passing each gate.
Key Takeaways
- Agents only see outputs from the previous stage—no context leaks between transformation steps
- Human oversight concentrates at the front (requirements judgment) and back (final review)
- Mutation testing catches what code coverage misses: tests that pass despite injected defects
- CRAP complexity enforcement prevents clever-but-unmaintainable solutions
- Property-based tests with FsCheck verify invariants across unbounded input spaces
The Bottom Line
This isn't AI coding—it's formalisation-as-a-service. Raw compute is the price of structural guarantees, and that trade is worth it for code you can trust without re-reading every line yourself. If you're still prompting a single agent to "build the discount system" and hoping for the best, you're doing it wrong.