Picture this: It's 3 AM, and you get an alert that one user session just burned through 47,000 tokens in a single conversation. Your ReAct-style customer support agent called the same search_knowledge_base tool 73 times in a row, with slightly different queries each time, never once deciding to stop. If you've built any tool-using agent, this nightmare scenario probably sounds uncomfortably familiar.

What's Actually Happening Inside the Loop

A typical agent loop follows a simple pattern: the model generates an action, you execute it, append the result to context, and ask what comes next—repeating until the model says it's done. The failure mode lives in that 'until done' condition, and three things commonly go wrong. First, models have no concept of 'I've already tried this.' If history shows ten failed searches, the agent often interprets that as 'search harder' rather than 'this approach isn't working.' Second, tool errors are silent or ambiguous—returning an empty list could mean 'no results found' or 'the tool is broken,' and the model can't distinguish between them. Third, stop conditions are implicit; many implementations only terminate when the model produces a final-answer message, with nothing forcing that to happen.

The Fix: Explicit State, Hard Limits, Structured Feedback

The solution isn't a better model—it's better architecture. Here's what actually works in production. Set hard step limits so the loop terminates after MAX_STEPS regardless of what the model decides; for triage tasks, eight steps usually suffice while research workflows might need up to twenty. Implement action deduplication by hashing each (tool, args) pair before execution—if you've seen it before, return a synthetic observation telling the model it's repeating rather than running the tool again. Structure your error envelopes so tools return typed results with status fields like 'no_results' versus 'error' versus 'ok', giving the model clear signals to make better decisions.

Detecting Oscillation Beyond Exact Duplicates

Exact-duplicate detection catches the obvious cases, but agents get creative. Your agent might call search('authentication errors'), then search('auth errors'), then search('login failures')—semantically identical queries with different wording. A simple ProgressTracker class using a sliding window can help: track the last N tool calls and flag behavior as stuck if they all hit the same tool. This isn't perfect compared to semantic similarity via embeddings, but it catches roughly 80% of oscillation cases without requiring a separate model.

Why Frameworks Don't Solve This for You

Most popular agent frameworks give you a max_iterations parameter and call it a day. That's the floor of what you need, not the ceiling. If you're building anything beyond a demo, you also need per-tool quotas instead of just global step limits, logging that captures the full action-observation trail for post-mortems, a mechanism to inject 'you've already tried this' context back into the model, and a graceful exit path when limits hit—returning a partial answer rather than throwing an exception. The community-maintained Agent-Learning-Hub on GitHub covers these patterns at deeper levels, including academic papers on planning and reflection that explain why naive ReAct loops fail in the first place.

Prevention Habits That Actually Work

After enough 3 AM pages, you'll develop these habits naturally—or you can adopt them proactively. Log every action and observation with timestamps so you have full traces when things go wrong. Set token budgets per conversation enforced server-side; don't trust the agent to police itself. Write tools that return semantically useful errors like 'No results for query X—try a more general term' instead of empty arrays. Test with adversarial prompts designed to confuse the agent and verify it bails out cleanly. Track tool-call entropy—if variance in your distribution drops during conversations, that's a leading indicator of stuck behavior.

Key Takeaways

  • Agent loops fail from missing state, missing feedback, or missing limits—not bad models
  • Hard step limits (MAX_STEPS) guarantee termination regardless of model decisions
  • Hash (tool, args) pairs to detect and flag duplicate actions before execution
  • Structured error envelopes with status fields help models make smarter next steps
  • Most frameworks only give you the floor—per-tool quotas, logging, and graceful exits are on you

The Bottom Line

The model isn't broken—it's doing exactly what your prompt and architecture told it to do. Fix the architecture and the loops disappear. Accept that 'let the model decide when to stop' isn't a strategy; you're writing the loop, so own the termination logic.