Shridhar Shah, a Senior Software Engineer on the AI team at Cisco, just made the decades-old dream of self-improving software real—and he did it in roughly 150 lines of Python running on his laptop with no API key required.
The Old Dream vs. New Reality
The classic "Gödel Machine" concept from computer science theory had one killer flaw: before a program could change any line of its own code, it had to mathematically prove the change would help. Proving that about real-world code is basically impossible, which made the whole idea theoretical forever. Then came 2025 and the Darwin Gödel Machine paper by Zhang, Hu, Lu, Lange, and Clune (arXiv:2505.22954). They dropped the "prove it first" requirement and replaced it with something every engineer already does—try a change, run tests, keep what works. Their AI coding assistant jumped from solving 20% to 50% of hard GitHub issues using this approach.
How It Works
Shah's implementation has three parts: an agent (a bag of tiny skills like uppercase text or reverse strings), a test suite with known correct answers, and a loop that adds one skill at a time to fix failing tasks. The critical detail is that it saves every improved version so it can branch off later instead of getting stuck on a bad path. The core logic is brutally simple: if score(new_version) > score(old_version): keep(new_version). Make an edit, run the test, keep it if the score went up. That's literally the whole thing.
Small Fixes Unlock Big Ones
Here's where it gets interesting. One skill the agent adds—"normalize inputs" (trim whitespace)—does nothing by itself on any task. But earlier it learned a "title-case" skill that kept breaking on messy text like " the quick fox ". The moment cleanup gets added, two previously stuck tasks start passing simultaneously. This is the whole point in miniature: boring fixes become stepping stones for bigger ones. The agent isn't just adding features—it's improving its ability to improve itself.
Try It Yourself
The code is live on GitHub at github.com/Shridhar-2205/living-software/01-self-rewriting-agent. Run python demo_cli.py and watch the score climb from 1/8 to 8/8 in real time. The "edits" come from a fixed list of safe skills, so nothing dangerous ever runs—risk is zero while still demonstrating genuine self-improvement.
Key Takeaways
- Darwin Gödel Machine replaces mathematical proof with simple test-based verification
- Three components: bag-of-skills agent + tests + improvement loop that keeps only successful changes
- Runs locally in under a second on commodity hardware—no GPU cluster required
- Research paper showed 20% → 50% SWE-bench improvement at full scale
- Small "boring" fixes can unlock multiple stuck tasks simultaneously
The Bottom Line
This is where AI development is heading: programs that improve themselves while they run, with a simple test telling them what "better" looks like. Shah's 150-line demo proves you don't need a research lab or massive compute budget to start experimenting with self-modifying code. The question isn't whether this becomes mainstream—it's how fast you'll be left behind if you're not paying attention.