Build a Multi-Agent System with LangGraph: Supervisors, Swarms, and Handoffs (NCP-AAI Module 7)

Module 7 of 14 20 min read D1 · 15%D2 · 15% Lab code ↗

This is Module 7 of NCP-AAI Mastery, a free 14-module course that takes you from your first agent to NVIDIA-certified. Start at Module 1 or browse the full syllabus.

The Scout you finished in Module 6 is a one-man band. One agent searches, reads, and writes; one prompt carries both search etiquette and citation rules; and one context window holds all of it: raw page text next to the research plan, next to tool schemas, next to half a draft. Run it on a real research question and watch the quality slide: around the fourth source, the citation instructions stop landing, the model starts writing before it’s done reading — and verification never happens, because it doesn’t exist yet. Nothing is broken. The instructions are simply buried — a junk-drawer context makes every instruction compete with everything else in it.

This module is the pivot of the series. We split the work across a team of specialists — Searcher, Reader, Fact-checker, Writer — coordinated by a supervisor, and Scout produces its first complete, cited research report. The course pitch said “multi-agent.” Today it becomes true.

In this module

  • You’ll learn:
    • Justify when to move from one agent to a team — and when not to (coordination costs).
    • Compare supervisor, swarm, and hierarchical topologies on control, auditability, latency, and token cost — the way Domain 1 scenarios ask.
    • Implement a supervisor by hand in LangGraph: StateGraph + Command + routing by tool calling, with handoffs across four specialists.
    • Design handoffs: shared state vs. message passing, and what the supervisor does when a worker fails.
    • Differentiate MCP (agent↔tools) from A2A (agent↔agent) — which protocol for which boundary.
  • You’ll build: Scout becomes a team — a supervisor coordinating Searcher, Reader, Fact-checker, and Writer to produce its first complete cited report.
  • Exam domains covered: D1 — Agent Architecture and Design — 15% and D2 — Agent Development — 15%. Combined, 30% of the exam touches this module — the heaviest of the series.
  • Prerequisites: Module 2 (ScoutState, LangGraph), Module 3 (architecture patterns, workflow-first rule), Module 4 (Planner), Module 5 (checkpointer), Module 6 (RAG + search_sources). NVIDIA and Tavily API keys configured.

Where you are

  • ✅ Modules 1–6 — vocabulary, first agent, architecture, planning, memory, RAG
  • 👉 Module 7 — Multi-Agent Systems (you are here)
  • ⬜ Modules 8–14 — evals, guardrails, deployment, observability, NVIDIA stack, capstone, the exam

Scout before: a single agent that plans, searches, reads, and answers — one brain, one crowded context. Scout after: a supervisor coordinating four specialists over shared state, ending in a Markdown report with [n] citations. The architecture you designed in Module 3 is the one you build today.

Why one agent isn’t enough (and when it still is)

Three forces push a system from one agent to a team.

Specialization — each agent gets one focused prompt for one job. Scout’s Module 6 prompt juggled search etiquette and citation format at once — and verification had no prompt at all; the Fact-checker’s prompt now says one thing, loudly. Focused prompts fail less — and when they fail, you know which instruction failed.

Parallelism — independent subtasks run simultaneously (two specialists reading different sources at once). Mentioned for the taxonomy — LangGraph supports fan-out (dispatching one node invocation per work item, in parallel) with its Send API — but Scout’s pipeline is sequential and stays that way today.

Context isolation — each agent sees only what its step needs. The strongest argument of the three, and the one Module 6 made for us: the do-everything Scout drags raw page dumps into the same window where it’s trying to follow citation rules. The Writer doesn’t need search-tool schemas; the Searcher doesn’t need 40 KB of fetched HTML. Isolation isn’t tidiness — it’s the difference between instructions that land and instructions that drown.

The counterweight is coordination costs: every handoff — the transfer of control plus context from one agent to another — is at least one extra LLM call, extra latency, and a brand-new failure mode (a bad handoff upstream becomes confident nonsense downstream). A Scout run jumps from 6–10 LLM calls in Module 6 to 10–20 today — you pay for the team before it produces anything. Module 3’s rule — workflow first, agent if necessary — gets its sequel: single agent first, team if necessary. Scale out on evidence of the single-agent ceiling, not because the diagram looks better.

Topologies: supervisor, swarm, and hierarchies

Module 3 named the team patterns; this module builds one. Three canonical shapes:

A supervisor is a central agent that decides, turn by turn, which specialist acts next — typically by tool calling — and collects their results. Maximum control and auditability: every handoff passes through one loggable point. The price: the supervisor is a bottleneck, and each hop bills a routing call.

A swarm has no chief — agents hand control directly to each other, peer to peer. Flexible when conversations should flow between personas, but no single point sees the whole run: reconstructing “who decided what” means stitching together every peer’s view.

Hierarchical orchestration is supervisors of supervisors — each subtree owns a domain, a top-level supervisor delegates between them. The supervisor pattern applied recursively; it earns its keep only at the scale of tens of specialists. Build it when you have departments, not before.

SupervisorSwarmSingle agent
Control flowCentral agent routes every turnPeers hand off directlyOne LLM loops with tools
AuditabilityBest — one decision point logs all handoffsWorst — trajectory scattered across peersGood — one transcript
Latency overheadHigh — every hop adds a routing callVariable — no router, but unpredictable pathsLowest of the three
Token costHigh — routing calls + per-specialist contextHigh — duplicated context across peersBaseline
Failure isolationStrong — supervisor detects and contains a failed workerWeak — a lost peer derails the run silentlyNone — one context, one failure domain
Exam scenario cue”audit trail required”, “distinct stages”, “central control""open-ended peer collaboration”, “fluid persona handoffs""simple task, one domain” — or a chain, if steps are fixed

Scout takes the supervisor: a research pipeline has distinct specialist stages with a natural order, and a research report demands an audit story — which agent produced which source, which claim got which verdict. A swarm would scatter exactly the trace we need; a single agent is the ceiling we just hit. Here is the graph you’ll build — the reference diagram for the rest of the series:

flowchart TD
    Q([question]) --> P[Planner — M4]
    P <--> C[Critic — M4]
    P --> AP["approve plan
    (auto-true placeholder)"]
    AP --> SUP{{supervisor}}
    SUP -- "Command(goto=...)" --> SE[Searcher]
    SUP --> RD[Reader]
    SUP --> FC[Fact-checker]
    SUP --> WR[Writer]
    SE --> SUP
    RD --> SUP
    FC --> SUP
    WR --> SUP
    SUP -- finish --> R([cited report])

Every specialist reports back to the supervisor; nothing moves without a routing decision. That’s the cost — and the audit trail. (Command(goto=...) is LangGraph’s routing-plus-update return object — defined in the Handoffs section below.)

The supervisor-vs-swarm shape difference in one glance — a star against a mesh:

flowchart LR
    subgraph sup ["Supervisor: star"]
        S0((supervisor)) <--> A1[agent A]
        S0 <--> A2[agent B]
        S0 <--> A3[agent C]
    end
    subgraph sw ["Swarm: mesh"]
        B1[agent A] <--> B2[agent B]
        B2 <--> B3[agent C]
        B3 <--> B1
    end

Handoffs: shared state vs. message passing

A handoff, recall, transfers control plus context — who runs next, and what they get to know. Both halves are design decisions, and the second one has two schools:

Shared stateMessage passing
MechanismAll agents read/write one state objectEach agent receives a dedicated payload
Context isolationPartial — agents can see everything; discipline keeps them focusedMaximum — an agent knows only its payload
PlumbingCheap — the state schema already existsYou design and version every payload format
Audit storyOne state history tells the whole runReconstructed from N payload logs
Scout’s choiceScoutState — simple, auditable, checkpointed since M5Revisit if specialists ever live in separate services

Scout uses shared state: every specialist reads ScoutState and returns updates to it — the stateful orchestration you’ve run since Module 2, now at team scale. Context isolation still happens, but inside each node: the Searcher builds its prompt from the plan and ignores fetched page bodies; the Writer reads sources and claims and never sees a tool schema. The state carries everything; each prompt samples it.

In LangGraph 1.x, the idiomatic handoff is a node returning Command(goto=..., update=...) — one object that both routes control and writes state. The supervisor decides the goto target by tool calling — a Module 2 skill reused at the team level: one “tool” per specialist plus a finish tool, and the model’s tool call is the routing decision.

The last design decision is failure. What does the supervisor do when the Searcher comes back with zero sources? Three bad answers: crash (one flaky search call kills a 15-call run), ignore it (the Writer fabricates a report from nothing), retry forever (a token furnace with delegation). Scout’s policy, in numbers: one re-dispatch with a reformulated instruction, then degrade gracefully — the Writer produces a report that opens with a warning instead of pretending. Retries are policy, not judgment: the supervisor’s code enforces the cap; I don’t pay an LLM to decide whether to retry. That’s objective 2.4 — retry logic and graceful failure recovery — implemented where the exam expects you to know it belongs.

Interop protocols: MCP for tools, A2A for agents

Two open protocols now standardize the two boundaries of an agentic system — and confusing them is this module’s classic exam trap.

MCP (Model Context Protocol) standardizes the agent↔tools/context boundary: an MCP server exposes tools, resources, and prompts over JSON-RPC (a JSON request/response protocol), and any MCP-capable client consumes them without custom glue. Donated by Anthropic to the Agentic AI Foundation under the Linux Foundation in late 2025, adopted across the ecosystem; the current spec version is 2025-11-25. The lab’s optional annex turns Scout’s web_search into an MCP server and plugs it back in through an adapter — about ten lines prove the tool no longer cares what framework calls it. Code in the lab repo.

A2A (Agent2Agent protocol) standardizes the agent↔agent boundary: how independent agents — different vendors, frameworks, companies — discover each other and delegate work. An agent publishes an agent card describing its skills and endpoint; a client agent discovers it, sends a task, and tracks its lifecycle. Originally developed by Google, donated to the Linux Foundation; spec 1.0 is current. A2A stays conceptual in this course — the spec is stable, the SDKs are young — and the exam tests the concept, not the code.

MCPA2A
StandardizesAgent ↔ tools, resources, promptsAgent ↔ agent delegation
Who talks to whomOne agent ↔ its capability serversPeer agents across systems and vendors
Key primitivesServer, client, tools/resources/promptsAgent card, discovery, tasks
GovernanceLinux Foundation (Agentic AI Foundation)Linux Foundation
Maturity (June 2026)Production-grade, massive adoptionSpec 1.0 stable, SDKs still young
Use it whenWiring an agent to internal tools/data once, reusablyMaking your agent collaborate with someone else’s

Per the official study guide, objective 1.3 — configure agent-to-agent communication protocols for collaboration — is exactly this section plus the handoffs you just designed. The memorable version: MCP connects an agent to its hands; A2A connects it to its colleagues. The reason both exist: without protocols, every agent×tool and agent×agent pairing is a custom integration — an N×M explosion the industry has solved before, the same way, with standards.

Hands-on lab: build it

Objective: transform Module 6’s single-agent Scout into a supervisor + four specialists that produces its first complete cited report.

Observable result: uv run python -m scout "your question" prints the team trajectory — which agent takes the floor, in what order — then a Markdown report with [n] citations mapped to sources. The smoke tests pass offline; SCOUT_LIVE_TESTS=1 runs one real end-to-end team run. Full code: module-07/.

Step 1 — Extend the state (three fields, on schedule)

The course’s frozen field calendar introduces claims, report, and plan_approved in this module — the schema only ever grows, fields are never renamed or removed, like every schema change since Module 2 (reserve “append-only” for the reducer channels below):

# module-07/scout/state.py (excerpt)
Verdict = Literal["supported", "contested", "unverified"]  # frozen vocabulary

class Claim(TypedDict):
    statement: str
    supporting_sources: list[int]  # 1-based indices into sources — the [n] map
    verdict: Verdict

class ScoutState(TypedDict):
    question: str
    messages: Annotated[list, operator.add]  # now the TEAM's audit log
    plan: ResearchPlan | None
    plan_iterations: int
    sources: Annotated[list, operator.add]  # M6 — append-only: [n] never shifts
    claims: list[Claim]        # M7 — the Fact-checker's verdicts
    report: str                # M7 — the Writer's cited Markdown
    plan_approved: bool        # M7 — auto-True placeholder
                               # M9 will replace this with a human approval interrupt

One subtlety worth reading twice: sources keeps Module 6’s append-only reducer. A registered source’s [n] citation number is its position in that list — append-only is what keeps every number valid for the whole run. The consequence: the Searcher can’t park half-empty entries there for the Reader to fill in later. It hands its candidates over through the audit log instead, and only complete, read sources ever get appended — by the Reader, and by the Fact-checker for knowledge-base evidence.

Step 2 — Four specialists, four focused prompts

Each specialist is a plain LangGraph node in scout/agents/: one focused system prompt, one job, and a one-line report-back appended to messages so the supervisor (and you) can read the trajectory. The Searcher shows the shape — an LLM turn for judgment, code for execution:

# module-07/scout/agents/searcher.py — the focused prompt
SEARCHER_PROMPT = (
    "You are Scout's Searcher. Your only job is to find candidate sources: "
    "turn the research plan and the supervisor's instruction into at most "
    f"{MAX_QUERIES} web search queries. Two or three targeted queries are "
    "usually enough — precision beats volume; do not chase marginal sources. "
    "Respond with ONLY a JSON array of query strings, no prose, e.g. "
    '["nemotron coalition gtc 2026"].'
)
# module-07/scout/agents/searcher.py — the node (excerpt)
def searcher_node(state: ScoutState) -> dict:
    """Run the queries, dedupe by URL, publish the new candidates."""
    queries, note = _proposed_queries(state)
    known = {source["url"] for source in state.get("sources") or []}
    candidates: list[dict] = []
    errors: list[str] = []
    for query in queries:
        try:
            results = tools.web_search(query, max_results=MAX_RESULTS_PER_QUERY)
        except (httpx.HTTPError, RuntimeError, json.JSONDecodeError) as exc:
            # Narrated, never silent: the supervisor reads this and decides.
            errors.append(f"query '{query}' failed: {exc}")
            continue
        for result in results:
            if result["url"] in known or len(candidates) >= MAX_CANDIDATES:
                continue
            known.add(result["url"])
            candidates.append({"url": result["url"], "title": result["title"]})
    summary = f"{len(candidates)} new candidate sources from {len(queries)} queries"
    if note:
        summary += f" ({note})"
    if errors:
        summary += "; " + "; ".join(errors)
    if candidates:
        # The handoff to the Reader: a parseable line in the audit log —
        # NOT half-empty entries in the append-only sources[] channel.
        summary += "\n" + agents.publish_candidates(candidates)
    return {"messages": [agents.report_back("searcher", summary)]}

Downstream: the Reader reads the pending candidates back from the audit log (agents.candidates_from), fetches each with Module 6’s fetch_page, ingests it through the RAG pipeline, scores a simple reliability_score heuristic (HTTPS, trusted domains, substantial content — a number to argue with, not a truth), and appends one complete Source entry per page it managed to read; failed fetches are narrated, never registered. The Fact-checker extracts the 3–5 load-bearing claims, cross-checks each against the ingested corpus via search_sources, and assigns a verdict from the frozen vocabulary: supported, contested, or unverified; evidence retrieved from a page read in a past session is appended to sources[] (neutral reliability) so every supporting_sources index points at a citable entry. The Writer gets the cleanest context of anyone — numbered sources, verdict-tagged claims, the plan objective — and produces the report with [n] markers and a References section mapping each [n] to its full URL.

Step 3 — The supervisor: tool calling decides, Command moves

The supervisor offers the model five “tools” — assign_searcher, assign_reader, assign_fact_checker, assign_writer, finish — none of which execute anything. The tool call is the routing decision, translated into a Command:

# module-07/scout/supervisor.py (excerpt)
def supervisor_node(state: ScoutState) -> Command:
    """One routing decision: policy first (code), judgment second (LLM)."""
    if _turns(state) >= MAX_TEAM_TURNS:
        return _degrade(state, f"turn cap ({MAX_TEAM_TURNS}) reached")
    failure = _empty_handed(state)
    if failure:
        worker, attempts = failure
        if attempts <= MAX_WORKER_RETRIES:
            # Retries are policy, not judgment: deterministic, capped, free.
            return _dispatch(worker, RETRY_INSTRUCTIONS[worker])
        return _degrade(state, f"{worker} returned nothing after {attempts} attempts")
    target, instruction = _decide(state)
    if target == "finish":
        if not state.get("report"):
            # No report, no finish — code overrides the model here.
            fallback = _fallback_route(state)
            return _dispatch(fallback, "Finish was premature — complete your stage.")
        return Command(goto=END, update={"messages": [_note(f"finish: {instruction}")]})
    return _dispatch(target, instruction)

Read the layering: the LLM exercises judgment (who’s next, with what instruction); code enforces policy (turn caps, retry caps, “no finish without a report”). Both halves print into messages, so the trajectory reads like a meeting transcript.

Step 4 — Wire the team graph

graph.py stays construction-and-routing only, per the Module 3 split. The Planner loop from Module 4 runs first, untouched; a tiny approve node stamps the plan; then the supervisor takes over. Workers don’t route — each has one static edge back to the supervisor:

# module-07/scout/graph.py (excerpt)
builder.add_node("approve", supervisor.approve_plan_node)
builder.add_node(
    "supervisor",
    supervisor.supervisor_node,
    # Command(goto=...) targets, declared for rendering and validation.
    destinations=("searcher", "reader", "fact_checker", "writer", END),
)
for name, node in WORKERS.items():
    builder.add_node(name, node)
    builder.add_edge(name, "supervisor")  # workers report back, never route
builder.add_edge(START, "planner")
builder.add_conditional_edges(
    "planner", route_after_planner, {"critic": "critic", "approve": "approve"}
)
builder.add_edge("critic", "planner")
builder.add_edge("approve", "supervisor")
return builder.compile(checkpointer=checkpointer)

The approve node is three lines that matter: it sets plan_approved = True automatically, with the comment # M9 will replace this with a human approval interrupt — the seam is built before the feature.

Step 5 — Run it and read the trajectory

cd module-07
uv run python -m scout "What is the Nemotron Coalition announced at GTC 2026?"
[thread] scout-3aee69cc  [user] default
[planner] plan v1: Determine the nature and announcement details of the Nemotron Coalition presented at GTC 2026. — 3 steps
[critic] Plan critique (address every point in your revision):
- Duplicate coverage: Steps 2 and 3 both aim to extract the same publicly available details …
[planner] plan v2: Determine the nature and announcement details of the Nemotron Coalition presented at GTC 2026. — 4 steps
[approve] plan auto-approved (placeholder — Module 9 adds the human interrupt)
[supervisor] → searcher: Search for information about the Nemotron Coalition announced at GTC 2026.
[searcher] 6 new candidate sources from 3 queries
candidates: [{"url": "https://hcompany.ai/h-joins-nemotron-coalition", "title": "Nemotron Coalition - H Company"}, …]
[supervisor] → reader: Read the following candidate sources …, retrieve their content, and assign relevance scores to each.
[reader] 4 fetched and ingested (31 new chunks); reliability 0.7–0.7
Source 1: Confirms the coalition announcement at GTC 2026; promotional press release from H Company …
Source 4: Describes eight AI labs collaborating under the Nemotron coalition; …
[supervisor] → fact_checker: Extract the specific claims made about the Nemotron Coalition …, evaluate each claim for factual accuracy, and assign a verdict …
[fact-checker] 5 claims: 2 supported / 0 contested / 3 unverified
[supervisor] → writer: Draft a concise report summarizing the announced details …, citing verified claims and sources, and noting any unverified statements.
[writer] report drafted (897 chars, cites 3 sources)
[supervisor] finish: The Nemotron Coalition was announced at GTC 2026 … The report cites three sources and summarizes the verified claims.

=== Report ===

> **Report: Nemotron Coalition announced at GTC 2026**

- The coalition was announced during NVIDIA GTC 2026, held in Taipei from June 1–4, 2026. [1,3]
- NVIDIA stated the coalition was launched on March 16, 2026. [1] — *unverified*
- H Company joined the coalition on June 4, 2026 in Taipei. [1,4]
- The coalition includes eight founding AI companies such as Black Forest Labs, Cursor, LangChain, Mistral AI, and Perplexity. [1,4] — *unverified*
- Its purpose is to build open frontier models by combining expertise, computing power, and data resources. [1]

## References
[1] https://hcompany.ai/h-joins-nemotron-coalition
[3] https://www.linkedin.com/posts/genai-works_nvidiapartner-nvidiagtc-activity-7439644576417574912-FQOU
[4] https://www.tomshardware.com/tech-industry/artificial-intelligence/nvidias-nemoclaw-coalition-brings-eight-ai-labs-together-to-build-open-frontier-models

That trajectory is the deliverable, as much as the report: every handoff, instruction, and verdict in one readable log — the supervisor’s audit story, live. (Real output, June 2026; long lines truncated with — your sources and counts will differ.) Note the unverified verdicts in the tally: the Fact-checker flags what its evidence can’t confirm — that tally is your trust signal. (And note that the Writer only marked two of the three in its bullets — exactly the kind of drift Module 8’s evals will measure.)

Step 6 — Test it

cd ..                                                # back to the repo root
uv run pytest module-07/tests/                       # offline: scripted team run
SCOUT_LIVE_TESTS=1 uv run pytest module-07/tests/    # + 1 real end-to-end run

The offline suite replays a scripted full pipeline (fake LLM, fake search, fake fetch) and pins the contracts: state fields on calendar, verdicts from the frozen vocabulary, report contains [1], all four specialists visited, supervisor caps enforced, inherited files byte-identical to Module 6. The live test asserts the same shape against the real stack — and the Module 1–6 suites stay green on this tree, per the cumulative rule.

Try it yourself (no solution provided):

  1. Add a fifth specialist — a Reviewer that rereads the Writer’s report against the claims and either approves it or sends one revision instruction back through the supervisor. One node, one tool schema, one routing rule: if your Step 4 wiring made that hard, revisit it.
  2. Set MAX_TEAM_TURNS = 2 in supervisor.py and rerun. Watch the degradation path produce a warned, partial report instead of a crash — then put it back.

Exam corner

What the exam tests here. Per the official study guide, this module owns two objectives: 1.3 — configure agent-to-agent communication protocols for collaboration (handoffs, MCP vs. A2A) — and 1.5 — orchestrate multi-agent workflows and coordination (the supervisor pattern, end to end). It also deepens four introduced earlier: 1.6 (stateful orchestration — ScoutState now coordinates a team), 1.8 (adaptability and scalability — coordination costs, when to add or remove an agent), 2.3 (connecting tools and external systems — the MCP boundary), and 2.4 (error handling — bounded retries and graceful degradation in the supervisor). Domains 1 and 2 are 15% each on the official blueprint; questions are scenarios.

Quiz — answers after question 5.

  1. A bank builds a document pipeline with four distinct stages: classify, extract, compliance cross-check, summarize. Regulation requires a complete audit trail of which component decided what, and when. Latency is flexible. Best topology?

    • A) A swarm — peer handoffs keep the stages flexible
    • B) A supervisor routing four stage specialists — every handoff through one loggable point
    • C) A single agent with tools for all four stages
    • D) Four independent agents writing to a shared queue, no coordinator
  2. A startup converts inbound emails into structured tickets: extract fields, classify priority, emit JSON. The steps never vary; volume is high and the budget tight. The team proposes a supervisor with three specialist agents. What should you recommend?

    • A) Approve it — multi-agent is the more scalable architecture
    • B) A prompt chain orchestrated by code — fixed steps need no agent at all
    • C) A swarm instead — it removes the supervisor’s routing cost
    • D) A single ReAct agent with twelve tools to keep options open
  3. Two integration tasks: (i) connect your agent to internal databases and a ticketing API through one standard interface; (ii) let your agent delegate work to a partner company’s agent built on a different framework. Which protocol goes where?

    • A) A2A for both — it’s the newer standard
    • B) MCP for both — tools and agents are both “capabilities”
    • C) MCP for (i) — the agent↔tool boundary; A2A for (ii) — the agent↔agent boundary
    • D) A2A for (i), MCP for (ii)
  4. In a supervisor system, the Searcher sometimes returns zero sources, and the supervisor keeps re-dispatching it until the run times out. The best fix?

    • A) Remove the error handling so the failure crashes early and loudly
    • B) Unbounded retries with exponential backoff — it will succeed eventually
    • C) Cap re-dispatches (one retry with a reformulated instruction), then degrade gracefully to a report that carries a warning
    • D) Let the Reader run web searches itself whenever sources are empty
  5. After moving from one agent to a five-agent supervisor system, tokens per request grew 8× and latency 4×. Traces show ~14 supervisor turns per run, and every worker receives the full conversation history. Diagnosis and mitigation?

    • A) The model is too small for five agents — upgrade it
    • B) Coordination overhead: uncapped supervisor turns and full-context handoffs — cap the turns and hand each worker only the state slice its step needs
    • C) Expected multi-agent behavior — nothing to mitigate
    • D) Replace the supervisor with a swarm to eliminate routing calls

Answers. 1 — B. “Complete audit trail” is the supervisor’s cue: one decision point logs every handoff. A swarm scatters the trajectory (worst audit story); C rebuilds the junk-drawer context; D has no end-to-end trace at all. 2 — B. Fixed, enumerable steps are a workflow, not an agent problem — Module 3’s boundary, retested here from the multi-agent side. A and C add coordination costs for zero benefit; D adds non-determinism to a deterministic task. 3 — C. MCP standardizes the agent↔tools/context boundary; A2A standardizes agent↔agent delegation across vendors. The exam’s favorite trap in this module is swapping them — anchor on the boundary, not the buzzword. 4 — C. Bounded retry + graceful degradation is objective 2.4 in one line. A trades availability for nothing; B is the token furnace; D quietly duplicates the Searcher’s role inside the Reader — role boundaries exist so failures stay diagnosable. 5 — B. Both symptoms are coordination costs, not model capacity: routing turns multiply calls, and full-history handoffs multiply context tokens per call. The fix is a turn cap and context isolation per worker. D replaces the overhead with an audit problem.

Traps to avoid:

  • MCP vs. A2A. The number-one confusion of this module. MCP: agent ↔ tools and context. A2A: agent ↔ agent. If the scenario says “another team’s agent” or “another vendor’s agent,” it’s A2A; if it says “internal tools/data,” it’s MCP.
  • Multi-agent as the default answer. The exam rewards the simplest architecture that satisfies all the constraints. “More powerful” is never a justification; an unmet constraint is.
  • Terminology. The coordinating agent is a supervisor; orchestration is the concept of coordinating multi-agent workflows. “Orchestrator agent” in an answer option is a distractor, not a synonym.

Key takeaways

  • Context isolation — not headcount — is the real argument for multi-agent: focused prompts over a junk-drawer context.
  • Multi-agent costs before it pays: every handoff adds calls, latency, and a failure mode. Single agent first, team if necessary — and justify the team with evidence.
  • Supervisor = central control and the best audit story; swarm = flexible peer handoffs and the worst one; hierarchies are supervisors of supervisors, for genuinely large systems.
  • A handoff transfers control and context; Scout hands off through shared state (ScoutState), with Command(goto=..., update=...) as the LangGraph mechanism and tool calling as the supervisor’s decision-maker.
  • Let the LLM judge (who acts next, with what instruction); let code enforce policy (turn caps, one bounded retry, graceful degradation with a warning).
  • MCP standardizes agent↔tools; A2A standardizes agent↔agent. Different boundaries, different protocols — and the exam will test that you know which is which.
  • Scout now turns a question into a plan, sources, verdicts, and a cited report — the course’s promise, running on your laptop.

Keep going

Want the full NCP-AAI question bank (150+ exam-style questions) and the next module in your inbox? Subscribe here — it’s free, like everything in this series.

Your team produces reports — but are they actually good? Module 8 builds the eval harness that proves it.

Lab code · Course index · ← Module 6 · Module 8 →

References

  • Multi-agent — the current LangChain guide to multi-agent patterns (subagents, handoffs, routers).
  • Graph API — LangGraph’s StateGraph and Command reference — the mechanics behind this module’s supervisor.
  • MCP specification (2025-11-25) — the current Model Context Protocol spec: hosts, clients, servers, and the tools/resources/prompts trio.
  • A2A protocol documentation — the Agent2Agent protocol (spec 1.0, Linux Foundation): agent cards, discovery, and task lifecycle.
  • What Are Multi-Agent Systems? — NVIDIA’s glossary entry, an official study-guide reading for Domain 1.
  • NCP-AAI certification page — the official blueprint; Agent Architecture and Design and Agent Development are 15% each.