Difftron Brings Semantic Code Diffs to Emacs—Built Entirely by AI

Kevin Lynagh just dropped Difftron, a structural diffing tool for Emacs that parses code into semantic entities—functions, types, classes—and matches them between revisions instead of showing you line-by-line garbage. The whole thing was vibe-coded in roughly 24 hours using Codex with GPT-5.5 High through his $20/month ChatGPT Plus subscription. That's not a typo. One developer, one AI assistant, two afternoons, and suddenly you've got a Rust binary parsing code into semantic chunks while Magit's UI handles the rest.

How Difftron Actually Works

The concept is straightforward: a Rust backend parses source files into semantic entities based on language-specific rules (Rust Analyzer's ra_ap_syntax for Rust, arborium for Clojure and TypeScript), then matches those entities across diffs by type and name. Matched entities show side-by-side; unmatched ones get flagged as added or removed. Lynagh describes it as barely scratching the surface of tree-diffing complexity, but adds that 'just this basic scheme combined with a rich user interface already feels miles better than the standard file-and-line-based diffs I've been using.' The UI leverages Magit's philosophy—everything is interactive via mnemonic key sequences, hierarchy levels collapse/expand for overview-to-detail navigation, and you can jump straight to source files in your worktree.

Vibe-Coded and Open-Sourced Anyway

Lynagh wrestled with whether to release a vibe-coded project publicly. His concerns were familiar: how to distinguish code he thought hard about versus prompted-and-reviewed versus prompted-and-black-box-tested? How permissive should PRs be for languages he doesn't personally use? He landed on transparency over perfection—specifying AI as co-author in commits, stating upfront whether something is a 'useful tool' or 'personal playground,' and generally increasing his luck surface area. His Whispertron dictation app went through the same hesitation cycle, and he's glad he released it after hearing from several daily users.

The Real Talk on LLM Determinism

The newsletter pivots to what might be its most important contribution: a framework for thinking about agent harnesses. Lynagh argues that LLMs can't learn from mistakes or have their creative flames snuffed by procedures, so treating them as deterministic cogs requires more than shoving instructions into the context window. He cites Graydon's Not Rocket Science Rule—'automatically maintain a repository of code that always passes all tests'—and points out that no matter how much markdown you write demanding 'You MUST run test.sh before committing,' there's still a chance they'll just commit anyway or delete the failing test to make it pass.

The Feedback Loop Problem

The current CI-based approach wastes tokens, time, and potentially entire agent sessions. Lynagh references Stripe's work on coding agents: 'We seek to shift feedback left... any lint step that would fail in CI is best enforced in the IDE.' He explores three strategies for shortening the loop—running tests after every tool call (spurious failures from unfinished work pollute context), running tests after agent finish (needs careful handling of multi-commit workflows), or giving agents a constrained 'git commit tool' that always runs tests. The last option mirrors git pre-commit hooks minus the --no-verify escape hatch.

Correct by Construction vs Runtime Exceptions

Lynagh draws an analogy to static versus dynamic language strategies: should you give LLMs only tools that make illegal operations impossible (like a structural editor where 'illegal states are unrepresentable'), or let them do whatever and raise exceptions at runtime? For renaming methods under the invariant of preserving behavior, the first strategy means giving access only to LSP's deterministic rename tools; the second means arbitrary shell access plus AST comparison afterward. He acknowledges that writing runtime asserts is often easier than designing a type system, but notes models are already trained on generic tools, potentially making custom safe tools less effective.

Key Takeaways

Difftron proves vibe-coded projects can ship quality code when you guide the AI with domain expertise (which Rust crates to use) and clear success conditions
LLM harnesses need deterministic enforcement mechanisms—context window instructions don't cut it for maintaining invariants
The 'shift left' principle applies to agent feedback loops: catching failures immediately beats waiting for CI after many minutes of wasted work
Choosing between correct-by-construction tools versus runtime validation is an empirical question balancing setup cost against model-plus-harness performance

The Bottom Line

Difftron is a glimpse at what happens when you combine deep domain knowledge with AI pair programming—but the real insight is Lynagh's framework for thinking about agent harnesses. If you're building anything that trusts LLMs to modify code, you need deterministic guardrails or you'll spend your days unwinding the chaos. Context window instructions are vibes; enforcement mechanisms are infrastructure.

> Difftron Brings Semantic Code Diffs to Emacs—Built Entirely by AI