Commercial electronic design automation tools cost semiconductor teams millions annually—simulation, formal verification, synthesis. The specifications are public (IEEE 1800-2017), the infrastructure exists in CIRCT, but nobody had bridged the gap between 'we can parse Verilog' and 'we can actually run a testbench.' Until now. A single engineer at Normal Computing used AI agents to land 2,968 commits on a CIRCT fork over 43 days in January-February 2026, adding a full verification stack: event-driven simulator, VPI/cocotb integration, UVM runtime support, bounded model checking, logic equivalence checking, and mutation testing. The result is 580,430 lines across 3,846 files—and real-world protocol testbenches running end-to-end.
The Numbers Tell the Story
The velocity curve tells you everything about how AI-assisted development actually works at scale. Week one started at ~25 commits per day as the agent figured out the architecture. By week seven (February 10-16), the team hit 124 commits per day—not because the AI got smarter, but because later work was more mechanical: regression infrastructure, test harnesses, quality gates. Test files exploded from 987 to 4,229 (a 4.3x increase) with a sharp inflection on February 6 when formal and mutation testing suites came online. The commit breakdown by category reveals where the real engineering meat is: Formal verification leads at 652 commits, docs and iteration logs at 521, Verilog frontend at 461, mutation testing at 372, and simulation engine work at 367. That "docs & iteration logs" bucket reflects a 1,554-iteration engineering log tracking every AI interaction cycle—because building complex systems with agents means documenting the reasoning, not just the output.
The Agent Stack
AI attribution breaks down to 40% Claude Opus 4.5, 14% Claude Opus 4.6, and 46% Codex models. Early work used Opus 4.5 with a custom StopHook to maintain continuity across long sessions—a hack that Opus 4.6's team mode made obsolete. The trickier finding: Codex version 5.2 had coordination issues, frequently reverting parallel changes from other agents. Version 5.3 resolved this, and its xhigh reasoning mode proved "particularly effective on complex debugging." They built a custom auto-continue utility to keep sessions running without manual babysitting.
Real Hardware, Real Verification
The real test isn't synthetic benchmarks—it's whether the simulator can handle verification IP built for commercial tools. Mirafra's open-source AVIPs (Advanced Verification IP) are complete UVM environments for APB, AHB, AXI4, SPI, JTAG, I2S, and I3C protocols. According to Normal Computing, no other open-source simulator could compile a single one of these testbenches—but their fork ran each to completion, including coverage collection that "matched the expected number." They also validated against NVIDIA's CVDP benchmark (783 hardware design tasks) and verified OpenTitan's prim_count module with bounded model checking up to 20 cycles.
Performance Reality Check
Let's be straight about this: circt-sim in interpret mode is still 100-1,000x slower than Verilator or commercial simulators. The JIT improvements they added show promise—2.1x speedup on I2S, 18% on simpler benchmarks—but most code still falls back to the interpreter. They're exploring three paths forward: coroutine-based compilation (clean abstraction but LLVM's coroutine machinery fights hardware process semantics), state-machine transformation at the IR level (no dependency, complex rewrite), and block-level JIT targeting hot basic blocks like clock toggles and driver loops.
The Bottom Line
The EDA industry has operated for decades on the assumption that verification tools require large, well-funded teams to build. This experiment suggests that assumption may be obsolete—or at least, worth stress-testing. A single engineer with AI assistance built a functional verification stack in weeks instead of years. That's not a replacement for commercial simulators hardened by production use, but it's a proof that the cost of building verification tools just dropped by an order of magnitude. The gap between 'possible' and 'practical' just got a lot smaller.