The Regex Problem Nobody Wants to Talk About
A developer writing at mbmccoy.dev has had enough. In a post that's been percolating through Hacker News, the author makes a simple but important request: Dear LLMs, stop using regex to parse code. Just stop. "Parsers are for parsing code," they write, and it's hard to argue with that logic when you've spent hours debugging production failures caused by regex edge cases.
Why Regex Breaks on Code
The core issue is structural. Regex was never designed to handle the complexity of programming languages. Nested delimiters, escaped quotes, the fuzzy boundary between identifiers and keywords—these things break naive regex approaches every time. The author points out that even validating something as seemingly simple as IP addresses proves tricky with regular expressions. When that basic task trips up developers, there's very little hope for parsing actual code syntax without introducing brittle edge cases that will bite you in production at 2 AM.
ANTLR: A Better Hammer
So what's the alternative? The author points to ANTLR (ANother Tool for Language Recognition), a parser generator that can build grammars for languages ranging from Ada to Zig. "You don't need deep compiler expertise to get a usable parser and syntax tree," they note, and that's the key insight here. Building language parsers used to require specialized knowledge that most application developers simply didn't have time to acquire. But in 2026, with LLMs writing much of the glue code, it's become surprisingly practical to spin up a proper ANTLR-based parser for common tasks like SQL interpretation.
The Shift From Over-Engineering to Reasonable Solution
Six months ago, reaching for ANTLR instead of regex would have been "insane over-engineering" according to the author. Building a full language parser to avoid a gnarly regex hack? Nobody had time for that. But the landscape has shifted. LLMs can help scaffold the boilerplate, generate the grammar bindings, and handle the glue code that makes parsers integrate smoothly into existing codebases. What was once a specialist skill is becoming accessible to everyday developers who just need reliable code parsing without the regex footguns.
The Demo That Proves It's Possible
The author updated their post on May 20, 2026 with a demo for a simple SQL parser task using this approach. This isn't theoretical anymore—it's a working example showing how the ANTLR route has become tractable for developers who aren't compiler engineers. The message is clear: if you're reaching for regex to parse structured code in 2026, you're probably doing it wrong.
Key Takeaways
- Regex struggles with nested delimiters, escaped characters, and keyword vs identifier ambiguity in code
- ANTLR supports grammars from Ada to Zig and doesn't require compiler expertise to use effectively
- LLMs can now help scaffold parsers, making proper parsing more accessible than ever before
- What was "insane over-engineering" six months ago is now a practical, fast-to-implement solution
The Bottom Line
The next time your AI coding assistant suggests using regex for something that looks remotely like structured code, push back. Point it toward ANTLR or similar parser generators. We've spent decades accumulating regex debt from exactly this kind of lazy shortcut—let's not keep compounding the interest.