Most people use Claude Code like a very smart assistant. You give it a task, it works through it, you get a result. That works fine for plenty of things. But once you start building real automation systems, you hit a ceiling pretty quickly.
The ceiling is sequential execution. One thing at a time. Task finishes, next task starts. For simple workflows that's fine. For anything with real complexity, processing batches of data, running checks across multiple systems, breaking a large build into parallel workstreams, it's slow and brittle.
Claude Code sub-agents change that. They let you run multiple AI tasks in parallel, coordinate them from a master agent, and build systems that actually scale. This is a practitioner guide to how they work, how we've used them at AMPL on real builds, and where they'll bite you if you're not careful.
What Sub-Agents Are in Claude Code
A sub-agent in Claude Code is a child process. A separate Claude instance that a parent agent spawns to handle a specific piece of work. The parent (sometimes called the orchestrator) coordinates everything. The sub-agents do the actual tasks.
Think of it like a manager and a team. The manager doesn't do every task personally. They delegate, track progress, and pull the results together. Sub-agents are the team.
In Claude Code, this happens through the Task tool. When a parent agent calls Task, it creates a new Claude instance with its own context window, its own tool access, and a specific instruction set. That sub-agent runs to completion and returns a result. The parent can spawn dozens of these simultaneously.
Each sub-agent is genuinely independent. It doesn't share memory with its siblings. It has its own view of the world, defined by what the parent passes to it. That independence is both the power and the complexity. More on that later.
Why Parallelisation Changes What's Possible
The difference between sequential and parallel task execution
Sequential execution is the default. You ask Claude to process ten files. It processes file one, then file two, then file three. The total time is roughly ten times the time for one file.
Parallel execution with sub-agents flips that. You spawn ten sub-agents simultaneously, each handling one file. The total time is roughly the time it takes to process one file. The rest happens concurrently.
For small tasks the difference is marginal. For anything at volume, batch processing, multi-system checks, large codebases, it's the difference between something that runs in thirty seconds and something that runs in five minutes. That matters a lot in production.
We ran into this on a client data processing build. They had a weekly batch of incoming records that needed validation against three separate external systems. Sequential execution was taking around twelve minutes per run. Moving to sub-agents, one agent per record batch with parallel validation checks, got that to under two minutes. Same logic, same checks, just not waiting in a queue.
When sub-agents add speed vs when they add complexity
Sub-agents are not always the right answer. To be honest, for simple workflows they add overhead without much benefit.
Sub-agents make sense when:
Tasks are genuinely independent (no task needs the output of another to start)
You're processing multiple items of the same type (files, records, checks)
The individual tasks are substantial enough that spawning overhead is worth it
You need to hit a time constraint that sequential execution can't meet
Sub-agents add complexity without clear benefit when:
Tasks are sequential by nature (step B needs step A's output)
You're doing a single, contained piece of work
The coordination logic is more complex than the tasks themselves
You don't have good error handling in place yet (parallel failures are harder to debug)
Basically: start simple, add parallelisation when you hit a concrete bottleneck. Don't architect for scale you don't have yet.
How to Set Up Sub-Agent Workflows in Claude Code
Here's how to actually configure and invoke sub-agents in Claude Code, step by step:
Define the orchestrator agent. This is your parent Claude instance. Its job is to break the problem down, spawn sub-agents with the right context, collect results, and handle failures. Write its system prompt to reflect that coordination role explicitly.
Identify the parallelisable units. What are the discrete chunks of work that can run independently? Files, records, systems to check, code modules. Whatever the natural unit is for your use case.
Craft the sub-agent instruction set. Each sub-agent needs a clear, self-contained brief. What exactly is it doing? What does success look like? What should it return? Be precise. Sub-agents don't ask clarifying questions.
Call the Task tool from the orchestrator. The orchestrator invokes
Taskwith the sub-agent's instructions and any context it needs. Do this in a loop or list to spawn multiple agents simultaneously.Collect and validate results. The orchestrator waits for sub-agents to complete and assembles the outputs. Build validation logic here. Don't assume every sub-agent succeeded.
Handle failures before proceeding. Check which sub-agents returned errors. Decide whether to retry, skip, or surface the failure. Don't let one bad result silently corrupt downstream work.
Spawning sub-agents from a master task
The mechanics are straightforward. In your orchestrator's logic, you build a list of tasks and call Task for each one. Claude Code handles the actual parallelisation. You just need to structure the calls correctly.
The key thing to get right is scope. Each sub-agent call should include everything the agent needs and nothing it doesn't. A bloated context slows things down. A thin context causes hallucinated assumptions. Find the minimum necessary information for each task and pass exactly that.
One thing that trips people up: the orchestrator's token limit still applies to the sum of all sub-agent responses it needs to process. If you're spawning twenty agents that each return 2,000 tokens, that's 40,000 tokens coming back into the orchestrator. Plan for that. Summarise sub-agent outputs where you can rather than returning everything verbatim.
Passing context between agents
Sub-agents don't share a memory. They can't see what their sibling agents are doing or have done. Everything they know comes from what the orchestrator passes them at the point of spawning.
In practice this means your orchestrator needs to be thoughtful about what context each sub-agent receives. For most parallel workloads this is clean. Each agent gets its own data slice and works on it independently. The complexity comes when you need agents to be aware of each other.
One pattern that works: run a first wave of sub-agents, collect their results, then pass a summary of those results to a second wave. Sequential waves with parallel execution within each wave. It's not true full parallelism but it handles most real-world dependency patterns.
What doesn't work: expecting agents to communicate in real time. They can't. If your task requires Agent A to update Agent B mid-execution, sub-agents aren't the right architecture for that part of the workflow.
Handling failures in parallel runs
This is where most sub-agent implementations fall over. Sequential workflows fail loudly. One thing breaks, everything stops, you fix it. Parallel workflows can fail quietly. One agent fails out of twelve, the orchestrator doesn't notice, and you get eleven results presented as twelve.
Build explicit failure handling from the start:
Every sub-agent should return a structured result that includes a success or failure status, not just its output
The orchestrator should check status before using results
Decide your retry policy upfront: automatic retry, manual review queue, or fail the whole batch
Log failures with enough context to diagnose them. Which agent, which input, what error
We built a document processing system for a client that was running about forty sub-agents per batch. Early versions had no structured failure handling. When an agent failed, usually on a malformed input document, we'd get thirty-nine results and no indication anything had gone wrong. Adding a simple status wrapper to every sub-agent output fixed that entirely.
Real Use Cases Where Sub-Agents Win
Processing multiple files simultaneously
This is the most common use case and the easiest to implement. You have a folder of files, contracts, invoices, reports, code files, and you need to do the same analysis or transformation on each one.
Sequential processing makes every file wait for every other file. Sub-agents process them all at once. For ten files, the speed difference is noticeable. For a hundred files, it's the difference between a usable system and one nobody actually runs.
The pattern is clean: orchestrator reads the file list, spawns one sub-agent per file with the file content and processing instructions, collects results. If one file is malformed, that sub-agent fails and you know exactly which one without touching the others.
Running independent checks in parallel
A lot of validation and QA workflows involve checking the same thing against multiple systems or criteria. Does this record exist in System A? Does it match the data in System B? Does it pass the rules in System C?
Sequential: check A, wait, check B, wait, check C. Parallel: spawn three sub-agents simultaneously, get all three results back together.
We built this for a client doing supplier onboarding. Each new supplier needed to be checked against their internal CRM, a compliance database, and their existing contracts system. Three separate checks, no dependency between them. Sequential was taking about forty seconds per supplier. Sub-agents brought that to roughly twelve seconds. Same checks, just not waiting in line.
Breaking a large build task into parallel workstreams
For code generation or complex document production, you can often split the work across multiple sub-agents handling different sections simultaneously. One agent writes the authentication module while another writes the data layer while a third writes the API routes.
This is more complex to coordinate because the outputs need to fit together. You need clear interface contracts defined before spawning. Each agent needs to know what the others will produce so they can write to that spec. The orchestrator then integrates the pieces.
It's genuinely harder than the file processing pattern. But for large builds with clear module boundaries, it dramatically reduces wall-clock time. The risk is integration failures. Agents that each did their job correctly but produced pieces that don't fit together. Spend time on the interface definitions before you spawn anything.
Common Mistakes When Running Multiple Agents
A few things that have caught us out and will likely catch you out too:
Context bloat. Passing the entire project context to every sub-agent because it feels safe. It's not. It's slow and expensive. Each sub-agent should get exactly what it needs. Be ruthless about this.
No failure handling. Covered above, but worth repeating. Silent failures in parallel workloads are genuinely dangerous. Always return a structured status from every sub-agent, always check it in the orchestrator.
Assuming sub-agents are free. Every sub-agent is a Claude API call. Spawning fifty agents on a task that could be done sequentially in reasonable time is just burning tokens. Run the numbers before you commit to a parallel architecture.
Building coordination complexity first. A lot of people try to build the orchestration layer before they have the sub-agent logic working. Get one sub-agent doing its task correctly before you think about running ten in parallel. Test the unit before you test the system.
Ignoring rate limits. If you're spawning many agents simultaneously against external APIs, not Claude itself but systems your agents interact with, you can hit rate limits that have nothing to do with Claude. Build in awareness of the external constraints your sub-agents will encounter.
Not logging agent boundaries. When something goes wrong in a parallel system, you need to know which agent produced which output. Log agent IDs and their inputs alongside their outputs from the start. Debugging a failed parallel run without this is painful.
FAQ
What is a sub-agent in Claude Code?
A sub-agent is a separate Claude instance spawned by a parent (orchestrator) agent to handle a specific piece of work. The parent coordinates the overall workflow and spawns sub-agents using the Task tool. Each sub-agent runs independently with its own context and tool access, completes its assigned task, and returns a result to the orchestrator. Multiple sub-agents can run simultaneously, which is what makes parallel task execution possible.
How many sub-agents can Claude Code run at once?
There's no hard published cap on concurrent sub-agents, but practical limits exist. Your API rate limits apply across all concurrent calls. Each sub-agent consumes tokens, and the orchestrator needs to handle all returned results within its own context window. In practice, most production builds work well with five to twenty concurrent sub-agents. Beyond that, you need careful context management and rate limit handling.
Can sub-agents communicate with each other directly?
No. Sub-agents are independent. They don't share memory or communicate in real time. Everything a sub-agent knows comes from what the orchestrator passes it at spawn time. If you need agents to share information, the orchestrator has to mediate: collect results from a first wave, synthesise them, then pass relevant information to the next wave. Direct agent-to-agent communication isn't part of the current architecture.
When should I use sub-agents vs a single agent in Claude Code?
Use sub-agents when you have genuinely independent tasks that benefit from parallel execution, processing multiple files, running independent checks across systems, or splitting a large build into separate workstreams. Stick with a single agent when tasks are sequential by nature, the work is a single contained job, or the coordination overhead would outweigh the speed benefit. Sub-agents add real complexity, so only reach for them when you have a concrete reason to.
How do I handle errors in a sub-agent workflow?
Build structured failure handling from the start. Every sub-agent should return a result object that includes a status field alongside its output, something like success or failure plus an error message. The orchestrator checks status before using any result. Decide your retry policy upfront: automatic retry on transient failures, manual review for data issues, or fail-fast for critical errors. Log each agent's inputs and outputs with enough detail to diagnose failures after the fact.
Are Claude Code sub-agents expensive to run?
Each sub-agent is a separate Claude API call, so costs add up proportionally with the number of agents you spawn. The efficiency argument is about time, not cost. You'll use roughly the same total tokens whether you run tasks sequentially or in parallel, but you'll get results faster. Where parallel execution does save money is in complex tasks where a single long sequential run uses a large context window. Splitting work across focused sub-agents can be more token-efficient than one sprawling task.
Where to Go From Here
Claude Code sub-agents are one of the more powerful capabilities in the current AI toolset, and they're still underused because the written guidance on them is thin. Most teams either don't know the pattern exists or try it once, hit an unexpected failure, and go back to sequential execution.
The basics aren't complicated: orchestrator spawns sub-agents, sub-agents do work in parallel, orchestrator assembles results. The real skill is in the architecture decisions. What to parallelise, how much context to pass, what failure handling to build in before you need it.
If you're building production automation with Claude Code and want to know whether sub-agents are the right fit for your specific workflow, we're happy to take a look. Book a free audit at amplconsulting.ai and we'll tell you exactly where parallelisation would help and where it's just adding complexity you don't need.

