Claude Code Agent Teams: From Setup to Your First Multi-Agent Run

Enable Agent Teams

Agent Teams is an experimental research preview in Claude Code. It requires Opus 4.6 or later and is disabled by default. You enable it through your settings file.

Open (or create) ~/.claude/settings.json and add the feature flag:

{
  "env": {
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
  }
}

If the file already has other settings, merge the env block into what’s there. Don’t overwrite existing keys.

Alternatively, you can set it as a shell environment variable before launching Claude Code:

export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1

The settings.json approach is better for persistence. The environment variable works for one-off testing. Either way, restart Claude Code after making the change.

Verify the Feature Is Active

Start a new Claude Code session in any project directory:

claude

Once the session is running, type a simple check:

Are agent teams enabled?

Claude will confirm whether the feature flag is active. You should see a response like:

Yes, Agent Teams is enabled. I can create teams of parallel agents
that coordinate through a shared task list and direct messaging.

If the feature is not detected, double-check your settings.json path (~/.claude/settings.json) and make sure you restarted Claude Code after editing it.

You can also verify by asking Claude to list its available tools. The TeamCreate and SendMessage tools should appear in the list when Agent Teams is active.

Prepare Your Project for Teams

Agent Teams teammates each load your project’s CLAUDE.md file when they start. This is your primary control surface for setting team-wide rules. Any conventions, forbidden patterns, or quality bars you want every agent to follow belong here.

Before launching a team, add a section to your CLAUDE.md covering team-relevant conventions:

## Team Conventions

- Run `npm run build` before reporting any task as complete
- Do not modify files outside your assigned scope
- If you encounter ambiguity, message the lead instead of guessing
- All new functions must include JSDoc comments

These rules apply to every teammate automatically. You don’t repeat them in the team-creation prompt.

Teammates also inherit your MCP servers, skills, and any sub-agent definitions in .claude/agents/. The practical layering is:

CLAUDE.md: durable, team-wide rules (quality bars, forbidden patterns, build commands)
Team-creation prompt: role-specific scope, bias, and coordination rules
Sub-agents in .claude/agents/: inner-loop tools that individual teammates can delegate to within their own sessions

This separation keeps your team prompts focused on roles and coordination rather than repeating project fundamentals.

Design the Team in Plan Mode

Don’t jump straight to creating a team. Use plan mode first to design the decomposition. Plan mode runs in a single session (cheap) and gives you a checkpoint before committing real tokens.

Start by entering plan mode:

/plan

Then describe what you want the team to accomplish. For a test run, a code review is the lowest-risk option because it doesn’t modify any files:

Plan an agent team to review our authentication module for production
readiness. I want 3 reviewers looking at the same code from different
angles: security, reliability, and maintainability. They should
challenge each other's findings before producing a final report.

Claude will produce a structured plan showing:

The teammates it would create (names, roles, biases)
The scope each teammate will review
The coordination rules (when to debate, how to resolve disagreements)
The expected output format

Review the plan carefully. Check that:

Roles are genuinely different. Security, reliability, and maintainability produce orthogonal findings. “Frontend reviewer” and “UI reviewer” do not.
File scopes don’t overlap for build tasks. For review tasks, overlap is expected and intentional.
Coordination rules are explicit. The plan should specify what triggers debate and how it resolves.

Once satisfied, approve the plan and exit plan mode. You now have a validated decomposition to hand to the team.

Launch and Run the Team

With the plan approved, create the team by giving the lead a detailed prompt. Here is a concrete example for the adversarial review pattern:

Create an agent team to review the comments system (infrastructure/comments/
stack.js, the Lambda handler, and API Gateway integration) for production
readiness.

Spawn 3 teammates:

1. Security reviewer
   - Focus: input validation, auth headers, IAM permissions, injection vectors
   - Bias: assume the code is hostile until proven otherwise

2. Reliability reviewer
   - Focus: error handling, cold-start behavior, DynamoDB retry logic,
     observability gaps
   - Bias: assume every external dependency will fail

3. Maintainability reviewer
   - Focus: naming, test coverage, CDK construct reuse, documentation
   - Bias: assume a new engineer will own this code in 6 months

Coordination rules:
- After initial review, each teammate reads the others' findings
- Any finding rated "critical" by one and "low" or "non-issue" by another
  triggers a debate round via SendMessage
- Teammates must push back with reasoning, not just agree
- Debate ends on consensus or after 2 rounds with no movement
- Lead produces a final report: agreed issues, unresolved disagreements
  (both arguments), and a recommended fix list

The lead will parse this prompt, create the three teammates, and begin orchestrating. You’ll see status updates as each teammate starts its review.

Monitor Each Agent in Real Time

Once the team is running, you have several ways to watch what’s happening.

Check the task list. Ask the lead for a status update at any time:

What's the current status of all teammates?

The lead will report which tasks are in progress, which are complete, and which are blocked. You’ll see output like:

Team status:
- Security reviewer: ✓ Initial review complete, 4 findings submitted
- Reliability reviewer: In progress, reviewing error handling paths
- Maintainability reviewer: ✓ Initial review complete, 6 findings submitted

Debate round triggered: Security finding #2 (critical) conflicts with
Reliability assessment (non-issue). Debate in progress.

Message a specific teammate directly. You don’t have to go through the lead. Address a teammate by name:

@security-reviewer What's your most critical finding so far?

The teammate will respond with its current analysis, independent of what the lead has aggregated.

Watch the debate unfold. When a debate round triggers, the lead reports the exchange. You can ask for the full thread:

Show me the debate between security and reliability on finding #2.

This surfaces the actual arguments each agent made, the evidence they cited, and whether severity changed as a result.

Check the shared task list on disk. Team state lives at ~/.claude/teams/{team-name}/. You can inspect the task files directly in another terminal:

ls ~/.claude/teams/
cat ~/.claude/teams/comments-review/tasks.json

This shows the raw task state without going through the lead, useful if you want to script monitoring or track progress externally.

Read the Final Synthesis

When all teammates finish their reviews and any debate rounds resolve, the lead produces a synthesis. Ask for it explicitly if it hasn’t appeared:

Give me the final report.

The synthesis follows the structure you defined in the coordination rules. For an adversarial review, expect:

Agreed findings listed by severity with the reasoning that survived debate. These are your highest-confidence items because multiple agents with different biases converged on the same conclusion.

Unresolved disagreements where two rounds of debate produced no movement. The lead records both positions. These are the most interesting findings, because they represent genuine ambiguity in the code. A security concern that a reliability reviewer can’t dismiss (and vice versa) is exactly the kind of finding a single-agent review would never surface.

Recommended action list prioritized by severity and effort. The lead draws on all three reviewers’ inputs to order the fixes.

The metric to track across runs is the severity delta: how many findings changed severity during debate rounds. If the number is zero every time, the adversarial configuration isn’t adding value over a single-agent review. If findings regularly shift in severity, the debate clause is earning its tokens.

Save the report, apply the fixes, and use the severity delta to decide whether to run the team again on your next review.