If you've jumped into vibecoding or AI-assisted coding, you know the thrill. Tell your agent what you want, and it gets done—elves hard at work in the terminal, shipping features while you sip coffee. That magic fades fast once your project grows beyond toy-sized, though. Suddenly your agent starts hedging like a politician: "I see the problem clearly! But then again, it could be something entirely different." Sound familiar? Before you switch models or blame the AI, read on—you might be the bottleneck.
When Vibecoding Goes Sideways
Developer Carlo Capasa documented his journey into AI-assisted coding with GLM and Codex, and it's a cautionary tale worth studying. After building out features on an interactive project, things started breaking. Every small change consumed massive token budgets. Agents gave increasingly vague responses. Refactoring attempts went nowhere. Explicit instructions got ignored. The code stubbornly stayed broken no matter how many tokens he threw at the problem. His first instinct mirrored what most developers would do: try a different model. ChatGPT's Codex offered its own brand of uselessness—"I have tightened the input area, but this is not the complete task as the core portion was left for later consideration." He considered Opus but suspected the problem wasn't the models.
The Real Culprit: Your Problem Definition
After significant refactoring and deep analysis, Capasa realized what was happening. His implicit instructions were sabotaging himself. When working with coding agents on evolving codebases, he had been saying: "Make sweeping changes to my codebase! Keep the tests running! Don't change the tests! Keep the style!" Those constraints don't just conflict—they actively work against each other.
Why Agents Treat Tests Like Gospel
Here's what your agent is actually trained for: surgical edits. It reads your codebase, grasps the style, and delivers changes speedily—provided you have solid architecture, good test coverage, and want small modifications. The training reinforces not messing with existing code structure. Every test becomes golden truth. When you're doing exploratory coding or architectural refactoring, this creates what Capasa calls "test hell." The temporary spaghetti code you tossed together because it worked? That's the next session's gospel now. Those brittle tests that never should have shipped? They count. Your agent's training is actively working against you when you need flexibility most.
What Human Programmers Actually Do
When human engineers need to make large architectural changes, they follow a different playbook. Write up a design document outlining what should change. Explicitly declare which tests are on limits for removal or modification. Test the result, not the journey there. Capasa's breakthrough came when he told Codex: "We are doing an architectural refactor. Put all the old tests away—we'll reinstate them later." Suddenly his agent was back to its old productive self. GLM stopped the funny dance. They both knew what they were supposed to accomplish once the conflicting constraints were removed.
Key Takeaways
- Coding agents excel at surgical edits, not architectural overhauls—know when to use which mode
- Your implicit problem definition (keep tests passing, don't touch tests, maintain style) creates impossible constraints
- When refactoring, explicitly tell agents to set aside legacy tests rather than maintaining all of them simultaneously
- Tests are for nailing down what you already have—they're not universal goods for every phase of development
The Bottom Line
Don't blame your coding agent when it seems helpless. Blame the problem definition you gave it. Before starting a session, ask yourself: am I asking for surgery or reconstruction? Because those require completely different approaches—and so do your instructions to AI tools.