Teaching LLM Agents to Remember: Stigmergy Math Fixes Stateless Tool Selection

LLM agents are hemorrhaging tokens on capability definitions they never use, and the problem is getting worse as tool catalogues grow. A new paper from Sebastian Hanke proposes a thin coordination layer that borrows mechanics from ant colony behavior to create a feedback loop between past successes and future selections. The core insight: today's agent loops don't learn from outcomes, so every wrong tool call becomes tuition paid again on the next structurally similar task.

What Stigmergy Actually Means

The paper leans hard into the biological analogy before getting technical. Stigmergy is how ant colonies find shortest paths to food without a map, leader, or any individual that understands the problem. Ants deposit pheromone trails as they walk; shorter paths get walked more often so pheromone accumulates faster there; longer paths decay unused. The colony converges on efficient routes because the environment itself carries the memory of what worked. French zoologist Pierre-Paul Grassé coined the term in 1959 while studying termite nest construction. His definition fits in one sentence: the state of the work guides the next action. A termite that finds a half-built pillar builds on it, reading trace left by predecessors who may be long dead. The intelligence sits in the protocol and the environment, not in any individual actor.

Why Current Agent Loops Are Stateless

Anthropic's own platform measurements show 58 tools consuming roughly 55,000 tokens of context before a single user question arrives. A community field report describes real setups exceeding 140,000 tokens of tool definitions. Every capability definition stays in context on every step and gets billed whether or not the agent actually uses it. The selection mechanism compounds this waste. Retrieval-augmented tool selection, MCP routers, deferred loading, progressive disclosure—these all reduce standing cost but operate as fixed configuration once deployed. The outcome of a past call doesn't change what surfaces next time. Wrong choices produce failed invocations, retry rounds, and error returns that load into context without contributing to the solution. None of that failure feeds back into future decisions.

The Design: Path Learning Over Tool Narrowing

Hanke proposes modeling capability sequences as a directed graph whose edges carry pheromone values—essentially learned trails between capabilities that led to successful outcomes. When an agent successfully completes a task, the path it took gets reinforced; unused edges decay over time through lazy exponential evaporation. Selection uses Thompson sampling over decayed evidence, giving deterministic results from a seed while preserving controlled exploration. The critical technical wrinkle is prompt caching. Modern hosts keep tool definitions in a cached prefix that reaches high hit rates—a stable tool block becomes cheap at fraction of nominal token weight. Any per-step edit to that block, whether narrowing the visible set or reordering it, invalidates the cache from the edit point onward and bills full input cost again. This changes the optimization target entirely. Narrowing the standing set trades documented standing cost for recurring cache-miss cost—it's no longer an unambiguous win. The paper's answer: keep every capability visible so the cache stays warm, but surface a learned path as small per-turn context after the breakpoint. That guidance cuts failed exploration before the one correct call, which is where the real token waste concentrates.

Security Review and Open Source

The system has undergone security review against its design, though controlled empirical study confirming the central token-reduction hypothesis remains pending. One preliminary self-run data point shows significant token reduction but also a cold-start success regression—the paper treats this as a mechanism check rather than evidence of efficacy. An open-source implementation accompanies the release with product branding separate from the mechanistic description in the paper. The architecture is described as hexagonal core behind two ports, with a proactive pre-call step inside the controlled loop that can surface learned-path guidance while keeping the cached tool block intact. Everything runs on the user's machine with no network egress—deterministic execution and strict local-first boundaries. The evaluation protocol is specified for future replication: hypotheses stated clearly, ablation axes defined (configuration switches turned off one at a time to isolate each component's contribution), baselines established, and pivot conditions documented. The paper reports honestly as a design document whose controlled study awaits running.

Key Takeaways

Tool definitions consume 55,000+ tokens before user input arrives in real deployments, billed on every step regardless of use
Current selection mechanisms are stateless—past failures don't inform future choices, so wrong turns get repeated endlessly
Prompt caching makes tool narrowing counterproductive; the paper recommends path learning instead to preserve cache hits while reducing exploration waste
The approach maps ant foraging mechanics (pheromone reinforcement plus decay) directly onto capability sequences as a directed graph problem

The Bottom Line

This is builder infrastructure solving a real operational pain point, not another theoretical framework waiting for someone else to implement. The stigmergy transfer is elegant precisely because it doesn't try to make agents smarter—it makes the environment remember what works. Whether the token savings justify the complexity depends entirely on catalogue size and task repetition rates, but for anyone running MCP setups at scale, this is worth bookmarking until the empirical data lands.

> Teaching LLM Agents to Remember: Stigmergy Math Fixes Stateless Tool Selection