If you've built a multi-agent workflow and watched your main AI agent's context window balloon to 80,000 tokens while subagents sit idle, you already know something went wrong. According to a deep-dive on DEV.to from developer wonderlab, the root cause is almost always the same: unclear division of labor between the Orchestrator (the main Agent) and its Subagents. The pattern repeats in production systems everywhere, and it kills performance when agents start stepping on each other's responsibilities.

Where Most Multi-Agent Designs Break Down

The article identifies three and only three jobs for an Orchestrator: Decide, Dispatch, and Collect. That's it. The Orchestrator reads workflow state, determines the next step, spawns subagents with task prompts, then collects their output files before updating state. It does NOT analyze bugs, write code, query logs, read raw files, or modify business data. Those tasks belong to specialized subagents running in isolated sessions. When developers violate this boundary—feeding 100k-line crash logs directly into the main Agent's context—they're not making the system smarter. They're bloating context and burning tokens on work that should be delegated.

Subagent Design: The Input Completeness Problem

The piece lays out three hard principles for subagent architecture. First, task prompts must contain everything a subagent needs—no references to 'previous analysis results' or implicit dependencies on conversation history. Each subagent runs in isolation with no access to the main Agent's session context. If you need background information in there, put it explicitly in the prompt. Second, output schemas are contracts: missing fields or wrong data types break routing logic downstream. Third, failures must be structured—subagents should always write an output file even when they crash, using {"passed": false, "error": "..."} so the Orchestrator can distinguish a genuine failure from a timeout.

Fan-Out/Fan-In Concurrency Control

The article gets into the messy reality of running multiple subagents concurrently. For fan-out patterns (one trigger spawning N concurrent agents), two constraints are non-negotiable: each subagent writes to a unique output file path, and the Orchestrator must wait for all results before proceeding. No async runtime? A polling loop with deadline checking works fine. The trickier decision is fan-in strategy. Use fail-fast when all branch results are required—missing any one makes the whole batch meaningless (like gathering data from distributed sources). Use collect-all when partial success suffices, like running three code-fix candidates where you only need one passing test suite to proceed.

Context Isolation as Quality Assurance

Perhaps the most counterintuitive insight: extra context isn't helpful for subagents, it's noise. The main Agent holds full workflow history—every file content, every raw output, every intermediate decision. Passing this to a subagent writing one patch doubles token cost and degrades focus on irrelevant historical baggage. Subagents should know only what's in their task prompt and the agreed output path. If you catch yourself thinking 'the subagent needs to understand the background,' that's a smell test failure—background belongs explicitly in the task prompt, not assumed from invisible history.

Key Takeaways

  • Orchestrator does three things: Decide (read state), Dispatch (spawn subagents), Collect (read outputs). Nothing else.
  • Subagent prompts must be self-contained—no implicit dependencies on conversation history that doesn't exist in isolated sessions.
  • Fan-out requires unique output files per agent and explicit wait logic before proceeding to fan-in.
  • Choose fail-fast for all-or-nothing requirements; choose collect-all for solution-space problems where any passing result suffices.

The Bottom Line

This isn't architecture theater—it's the difference between systems that scale and systems that collapse under token costs. If your main Agent is reading raw files or managing business logic, you've built a monolith with extra steps. Draw the line, enforce the schema contracts, and let your Orchestrator do what it was designed for: coordination, not computation.