Code Metal Wants AI to Generate Code That Proves Itself Correct

When your translated CUDA kernel controls avionics or your Rust rewrite powers semiconductor tooling, 'it compiles' isn't a correctness guarantee — it's a starting point. Code Metal is building AI-driven code translation systems for exactly these high-stakes environments, and they're not stopping at compilation checks.

The Testing Problem

For decades, the standard answer to 'does this code work?' has been: write tests, run them, check results. But as Edsger Dijkstra observed over fifty years ago, program testing can show the presence of bugs but never their absence. When Code Metal translates a C++ financial system into Rust or converts M files into VHDL for embedded hardware, they need to prove behavioral equivalence — not just hope the test suite is comprehensive enough. Code Metal's approach uses what they call a 'full spectrum' of software-assurance techniques. Lightweight methods like differential testing, property-based testing, static analysis, and type checking run continuously during generation to catch semantic mismatches early. But when correctness requirements are strictest — aerospace contracts, defense certifications, automotive safety compliance — they escalate to formal verification for mathematically rigorous proofs that the translated system behaves identically to the original.

AI Scales Generation; Formal Methods Scale Trust

The key insight driving their architecture is a clean separation: AI handles generation (proposing implementations, searching large solution spaces), while formal verification systems handle trust. The proof checker remains authoritative regardless of how many incorrect attempts the model generates. As Code Metal puts it in their research post, 'AI scales generation; formal methods scale trust.' This matters increasingly as AI systems churn out hundreds of thousands — potentially millions — of lines of code. Traditional review processes can't keep pace. By integrating verification artifacts directly into the generation pipeline, Code Metal aims to produce what they call 'evidence' rather than assertions. The goal isn't 'trust us, the model said it's correct.' It's 'here's the proof object. Verify it yourself.'

The Loris D'Antoni Coup

Today marks another signal of intent: Professor Loror D'Antoni from UC San Diego is joining as Code Metal's first 'Code Metal Scholar' while maintaining his position as a Jacobs Faculty Scholar and full professor. D'Antoni's research portfolio reads like a roadmap for verified AI code generation — specification-aligned LLMs that generate provably correct outputs, techniques like ChopChop (presented at POPL 2026) for constraining model generation toward verifiable programs, and personalized compiler synthesis using program synthesis and LLMs. Prior to joining Code Metal, D'Antoni served as an Amazon Scholar at AWS designing formal methods for access control policy verification. His academic background spans over a decade of leading research in program verification and program synthesis — the science of generating programs guaranteed to satisfy specifications. 'My goal as a researcher has always been to help people write software they can trust,' D'Antoni said in Code Metal's announcement. 'Code Metal is combining formal methods and AI to build a new generation of trusted software translation systems.'

Key Takeaways

Code Metal targets domains (aerospace, defense, semiconductors, automotive) where correctness failures mean recalls, certification issues, or mission failure
Their pipeline integrates validation throughout generation rather than treating verification as post-processing
The company explicitly separates AI's role in generation from formal methods' role in establishing trust guarantees
D'Antoni's research on specification-aligned LLMs directly addresses the question: how do we ensure LLM outputs are provably correct?

The Bottom Line

This is what verified AI code generation actually looks like — not 'move fast and break things,' but rigorous proof obligations, machine-checked correctness, and zero tolerance for semantic drift. If Code Metal can deliver on this promise at scale, they're not just building a translation company; they're building the verification layer that AI-generated software has desperately needed.

> Code Metal Wants AI to Generate Code That Proves Itself Correct

The Testing Problem

AI Scales Generation; Formal Methods Scale Trust

The Loris D'Antoni Coup

Key Takeaways

The Bottom Line

> RELATED DISPATCHES