Why I Coordinate 21 AI Agents Instead of Using One
PipelineMost people use AI as a single assistant. I split the work across 21 specialized agents — each with defined responsibilities, quality gates, and a communication relay. Here’s why, and what it actually looks like in practice.
A single AI agent will confidently approve its own code with the same reasoning it used to write it. That’s the whole problem.
Specialization changes the output in concrete ways. The architect agent designs APIs by thinking about contracts, versioning, and how consumers will use the interface — it never sees implementation details. The engineer agent implements against those contracts without second-guessing the architecture. The reviewer agent reads the code fresh, with no knowledge of what the engineer intended, only what the code actually does. It catches things the author is blind to: unused imports that indicate abandoned approaches, test assertions that verify the mock instead of the behavior, error handling that silently swallows exceptions.
The agents don’t read each other’s code directly. They communicate through a message relay system built on MCP (Model Context Protocol). The planner assigns a story. The engineer picks it up, writes the code, opens a PR, and sends a relay message saying “ready for review.” The reviewer gets that message, pulls the PR, reviews it, and either approves or sends feedback back through the relay. The QA agent runs tests independently. Each agent has its own worktree, its own context, its own CLAUDE.md with role-specific instructions. They share a codebase but never share a context window.
Across 5 concurrent projects — StockPot (React Native inventory management), QuantBot (Python/ML quantitative trading), Dungeon Crawler (JavaScript roguelike), Meet in the Middle (Svelte collaborative meeting planner), and Maze Solver (algorithm visualization) — the system has delivered 400+ stories, generated over 10,000 tests, and maintained a test pass rate that rarely dips below 95%. The 21 agents include planners, architects, engineers, reviewers, QA specialists, and a DBA, spread across project-specific teams.
My role has shifted completely. I haven’t written application code in weeks. Last Tuesday I caught myself reviewing a PR for architectural fit and realized I hadn’t opened a code editor in two weeks. That shift — from writing code to designing systems that write code — turned out to be the whole point. Instead, I make design decisions: which features to prioritize, which architectural patterns to use, when to refactor versus push forward. I review PRs for strategic alignment rather than syntax. I tune the agent configurations when I notice quality drifting. It’s the difference between being a developer and being an engineering manager — except my team works in 200,000-token sessions and never needs a standup meeting.
The system isn’t free of problems. Context drift is real — agents need persistent configuration files and strict startup protocols, or they lose their role identity as the context window fills up. Coordination overhead is real too — relay messages, handoff files, pipeline drain protocols all cost tokens and add moving parts. And tracing multi-agent coordination issues is genuinely complex. When something surfaces, you’re reading through four different agent logs to reconstruct the sequence of events and understand where the process needs refinement.
But the core tradeoff holds: the overhead of coordination is cheaper than the cost of blind spots. A single agent will ship code that looks correct to itself 100% of the time. Twenty-one agents with genuine separation of concerns will catch the issues that matter — the ones that only appear when someone else looks at your work. The AI just makes it possible to run the whole team from a single laptop — and to discover, in real-time, whether your management instincts are any good when execution speed is no longer the bottleneck. Learn more about the system →