Agent Architecture: Patterns, Trade-offs, and When Not to Build an Agent (NCP-AAI Module 3)

This is Module 3 of NCP-AAI Mastery, a free 14-module course that takes you from your first agent to NVIDIA-certified. Start at Module 1 or browse the full syllabus.

Last year I reviewed a system that summarized support tickets: six agents in a peer-to-peer swarm — a triager, two summarizers, a tone checker, a router, a “quality agent”. It worked, occasionally. Latency ran to minutes; token spend was roughly ten times what the task needed. And when a summary came out wrong, nobody could say which agent had decided what — control had hopped between peers five times with no trace. The replacement, shipped a month later: a three-step prompt chain. Faster, cheaper, debuggable in one read.

Nobody on that team was a bad engineer. They’d just never been asked the question this module drills: does this need an agent at all? The exam asks it relentlessly — Agent Architecture and Design is 15% of your score, tested in scenarios. By the end you’ll answer like an architect: pattern named, trade-offs weighed, decision written.

In this module

You’ll learn:
- Name and compare the canonical patterns: prompt chain, router, single agent, supervisor, hierarchy, swarm.
- Apply a trade-off grid — latency, cost, reliability, auditability, scalability — to choose a pattern in a scenario, the way Domain 1 questions ask.
- Decide workflow vs. agent, and identify the anti-patterns of over-agentified systems.
- Write an architecture design doc — the artifact that survives the meeting.
You’ll build: Scout’s architecture design doc — and a refactor of the Module 2 graph to match the target architecture, without changing its behavior.
Exam domains covered: D1 — Agent Architecture and Design — 15% of the exam.
Prerequisites: Modules 1–2 (your ReAct graph runs); NVIDIA API key configured.

Where you are

✅ Module 1 — What Is Agentic AI? — vocabulary, landscape, first NIM call
✅ Module 2 — Build Your First AI Agent — ReAct loop, tool calling, first graph
👉 Module 3 — Agent Architecture (you are here)
⬜ Modules 4–14 — cognition, memory, RAG, multi-agent, evals, guardrails, deployment, the exam

Scout before: a single-agent ReAct loop with web_search, everything in one file. Scout after: same behavior — but the graph is reorganized along the pattern it will grow into, and a design doc traces the route to Module 13. The only module whose main increment is a document, deliberately: the exam tests this domain in scenarios, not code.

Workflows first: chains and routers

Module 1 placed systems on a spectrum from a single LLM call to a multi-agent team. Architecture starts at the humble end, which wins more often than demos suggest.

Prompt chaining is the simplest composition: the output of one LLM call becomes the input of the next, and the order of steps is fixed in your code. The model decides words; your code decides what happens next — always. Routing adds one decision: a classifier (often a small, cheap model) dispatches the input to one of several specialized paths — each path itself a call or a chain. And when steps must share evolving data — accumulated results, retries, verdicts — you add stateful orchestration: a typed state object threaded through the steps. You know this one: it’s LangGraph’s StateGraph from Module 2, minus the loop. The study guide files these under objective 1.6, with logic trees — branching if/else structures in code, exactly what a router’s dispatch table is.

Why use these when agents exist? Three properties agents can’t match:

Determinism: the same input takes the same path. You can unit-test a chain.
Predictable cost and latency: three steps means three calls. Always.
Trivial debugging: a linear trace; the failing step has a name.

Concrete numbers: as a chain, that ticket pipeline (extract → summarize → format) is exactly 3 calls per ticket, every time. The six-agent version averaged 19 calls, and no two runs took the same path.

The shapes you’re choosing between:

flowchart LR
    subgraph chain ["Prompt chain"]
        direction LR
        A1[call 1] --> A2[call 2] --> A3[call 3]
    end
    subgraph router ["Router"]
        direction TB
        B0{classifier} --> B1[path A]
        B0 --> B2[path B]
    end
    subgraph agent ["Single agent"]
        direction LR
        C1[LLM decides] -->|tool call| C2[tools]
        C2 -->|observation| C1
    end
    subgraph sup ["Supervisor"]
        direction TB
        D0[supervisor] <--> D1[specialist A]
        D0 <--> D2[specialist B]
    end

The pattern gallery, left to right in order of increasing autonomy — and increasing cost, latency, and failure modes.

The first two are workflows: code controls the flow. The moment the model starts controlling the flow, you’ve crossed into agent territory.

The single agent: ReAct as an architecture decision

In Module 2 you built a ReAct loop and learned its mechanics. Strip the mechanics away and look at what you actually decided: in that loop, the LLM controls the flow — whether to call a tool, which one, how many times, when to stop. That, not the presence of an LLM or a while loop, defines an agent, and the exam leans on that boundary hard.

The study guide’s job description expects three system temperaments. A reactive system decides its next action from the current observation — sense, act, repeat; ReAct is reactive by construction. A deliberative system plans first — a multi-step plan toward a goal, then execution; plan-and-execute agents are the LLM incarnation, and Scout grows one in Module 4. A hybrid system layers both — where Scout ultimately lands, like most production agents.

When is a single reactive agent the right architecture, not a stepping stone?

The task is open-ended — steps can’t be enumerated in advance — but lives in one domain.
The toolbox is small: under roughly ten tools, the model picks well from descriptions; past that, selection degrades and schemas crowd the context.
No parallelism required: one transcript, one thing at a time.
One context window holds the working state of the whole task.

Scout checks every box — one domain, one tool, sequential work. That’s why it stays a single agent until Module 7: the question “when does it stop sufficing” gets answered with criteria, not vibes.

Scaling out: supervisor, hierarchy, swarm

When a task outgrows one agent — distinct skills, parallel work, crowded contexts — there are three canonical ways to arrange a team. The exam tests the taxonomy and trade-offs here; the build comes in Module 7.

A supervisor is a central agent that routes work to specialist agents and collects their results: it decides who acts next; the specialists decide how to act. Every handoff passes through one point — one place to log, audit, and intervene. Typical use: a pipeline with distinct stages, exactly Scout’s future (a supervisor coordinating Searcher, Reader, Fact-checker, and Writer). Terminology, because the exam is picky: supervisor names the agent; orchestration names the concept of coordinating multi-agent workflows — there is no “orchestrator agent” in the blueprint’s vocabulary.

A hierarchy is supervisors of supervisors: each subtree owns a domain, a top-level supervisor delegates between subtrees. It exists for genuinely large systems — tens of specialists, organized like departments — and two levels of routing means two levels of latency, cost, and misrouting: it earns its keep only at scale.

A swarm has no chief: agents hand control directly to each other, peer to peer. It shines when a conversation should flow between personas without a dispatcher — support migrating between a billing and a technical specialist. The price is steep: no single point sees the whole run, and tracing “who decided what” means stitching together every peer’s view.

The reference table to internalize for scenario questions:

Pattern	Who controls the flow	Latency	Cost	Debuggability / auditability	Typical use
Prompt chain	Your code, fixed order	Lowest, constant	N calls, fixed	Linear trace — trivial	Fixed multi-step transforms
Router	Your code + one classifier call	Low, constant	Fixed + 1 cheap call	One labeled branch decision	Triage into specialized paths
Single agent	The LLM (loop, tool choice)	Variable	Variable, capped	One transcript to read	Open-ended task, one domain, few tools
Supervisor	Supervisor LLM routes to specialists	High (every hop bills a routing call)	High	Central choke point — best multi-agent audit story	Distinct specialist stages + traceability
Hierarchy	Supervisors of supervisors	Highest	Highest	Auditable per level; deep traces	Tens of specialists, domain subtrees
Swarm	Peers hand off to peers	Variable	High	Hardest — no central view	Fluid persona handoffs, no natural chief

We implement the supervisor pattern hands-on in Module 7. Here, what matters is reading a scenario and naming the right row.

Choosing: the trade-off grid

Scenario questions hand you constraints and four plausible architectures. Score them against five criteria — the same five your design docs argue from.

Latency. Every LLM call in the critical path adds seconds. A chain has a constant number; an agent a variable number; a supervisor bills a routing call on every hop. A tight SLA works against every added agent.

Cost. Tokens × calls. The multipliers compound quietly: more agents means more calls, but also more context per call — each specialist re-reads its instructions and state, and reasoning models spend thinking tokens on every turn.

Reliability. LLM steps compose their error rates: if each step is right 95% of the time, five dependent steps are right about 77% of the time (0.95⁵). Every added agent is another decision point, and errors travel — a bad handoff upstream becomes confident nonsense downstream. Fewer dependent LLM decisions is a reliability strategy.

Auditability. Who decided what, when, from what evidence? A chain logs itself. A supervisor concentrates every decision at one inspectable point. A swarm scatters the story across peers. If the scenario mentions compliance or “explain the decision,” weight this criterion heavily — it usually decides the answer.

Scalability and adaptability — objective 1.8 by name. Not “does it handle load” but “can the architecture absorb the next capability without a rewrite?” Adding a specialist to a supervisor is a new node plus a routing rule; adding anything to a do-everything agent means re-tuning one giant prompt. The lab’s refactor prepares exactly this: Scout’s next organs must plug in, not bolt on.

One more decision belongs to architecture, though its implementation waits for Module 6: what shape is your knowledge in? A vector store retrieves by semantic similarity — the right default for unstructured documents. A knowledge graph stores entities and typed relationships, which is what relational reasoning needs: multi-hop questions like “which services depend on a library that depends on X?” follow edges, not embeddings — similarity search can’t hop. Related entities, dependencies, multi-hop → knowledge graph; find relevant passages → vectors (objective 1.7; Scout builds its retrieval in Module 6).

The surface your agent presents — chat, API, or a human approval point — is an architecture decision too (objective 1.1): pausing for plan approval needs an interruptible graph, plumbing you design for, not sprinkle on. Scout’s interrupt arrives in Module 9, its API in Module 10.

With constraints on the table, the decision falls out of a few questions:

flowchart TD
    Q1{"Can you write the steps<br/>down in advance?"} -->|yes| W["Workflow — chain,<br/>router. No agent."]
    Q1 -->|no| Q2{"One domain,<br/>< ~10 tools,<br/>no parallel work?"}
    Q2 -->|yes| SA["Single agent (ReAct)"]
    Q2 -->|no| Q3{"Distinct specialist stages?<br/>Central audit/control point<br/>needed?"}
    Q3 -->|yes| SUP["Supervisor<br/>(hierarchy if subtrees<br/>of specialists)"]
    Q3 -->|no| SW["Swarm — only if fluid<br/>peer handoffs ARE<br/>the requirement"]

Walk it top to bottom, stop at the first fit. Most real systems stop in the first two boxes.

Anti-patterns: when not to build an agent

Four failure shapes account for most agentic wreckage — and most wrong-but-tempting options in exam questions.

Agent-washing. A while loop around an LLM call isn’t an agent — and neither is an if after one. The term cuts both ways: calling a plain workflow an “agent”, or — the expensive direction — building an agent, with its variable cost and non-determinism, for a task whose steps you could have written down in advance. The test is always: who controls the flow? If your code does, it’s a workflow; name it and ship it proudly.

Premature multi-agent. Reaching for a specialist team before one agent has actually hit a wall. The coordination tax is real: routing calls, duplicated context, handoff failures, compounded errors. If the specialists never disagree and never run in parallel, they’re one agent wearing different hats — at six times the price. Scale out on evidence of the single-agent ceiling, not because the diagram looks better.

Autonomy without observability. An agent that acts but leaves no trace is undeployable — when (not if) it misbehaves, you have nothing to debug and nothing to show the auditor. No trace, no prod. Scout gets tracing in Module 11; the decision that traces must exist is made here, in the design doc.

The do-everything agent. Thirty tools, one prompt trying to be researcher, coder, and support rep at once. Tool selection degrades, the context saturates with schemas, every new capability makes every old one slightly worse. The opposite failure from premature multi-agent — same fix: match structure to the task’s seams.

Hands-on lab: build it

Architecture is tested in scenarios on the exam, not in code — so this lab makes you do architecture: write Scout’s design doc, then refactor the Module 2 graph to match it. The full lab lives in module-03/ of the labs repo.

Objective: produce docs/scout-design.md and reorganize the code so Scout can grow toward its target without a rewrite.

Observable result: the design doc is filled in, and uv run python -m scout.graph "…" behaves exactly like Module 2 — same trace, same kind of cited answer, all smoke tests green including Modules 1–2’s.

Step 1 — Write the design doc

The template (docs/scout-design-template.md), deliberately short:

# Architecture Design Doc — <system name>
1. Problem & success criteria      — what must work, measurably
2. Pattern chosen & alternatives rejected — the decision AND the road not taken
3. State & data contracts          — what's frozen, what may change
4. Failure modes & mitigations     — failure → detection → mitigation
5. Evolution path                  — how it grows WITHOUT a rewrite

Fill it for Scout’s target: a supervisor coordinating Planner, Searcher, Reader, Fact-checker, and Writer over shared state. Justify the pattern with this module’s grid — the exercise has teeth: auditable end to end kills the swarm, distinct specialist stages outgrow a single agent at Module 7, unpredictable control flow kills the fixed pipeline. Section 2’s rejection table is the most valuable thing you’ll write this module — it stops the next person from re-litigating the architecture. A filled reference ships in the repo (docs/scout-design.md); compare after you’ve written yours, not before.

Step 2 — Refactor: make the code match the doc

Module 2’s graph works, but one file holds everything: nodes, prompt, routing, construction, CLI. None of that blocks today’s agent; all of it blocks the evolution path you just wrote. Three moves, no behavior change:

Move 1 — call parameters into config.py. The rule was already “the model name lives only in config.py”; the refactor extends it to every call parameter:

# module-03/scout/config.py (excerpt)
MAX_TOKENS = 8192  # Nemotron 3 reasons before answering — keep headroom
MAX_ITERATIONS = 6  # hard stop: an agent loop without a cap is a token furnace
MAX_RETRIES = 2  # capped: past 2 you're burning tokens on a broken loop

Move 2 — node logic into nodes.py (new). agent_node, the streaming collector, tools_node, and the system prompt move out of graph.py. The dividing line: a node decides WHAT one step does; the graph decides WHO runs WHEN.

Move 3 — graph.py keeps only construction. Routing and wiring, nothing else:

# module-03/scout/graph.py (after — node logic now imported, not defined)
from .nodes import SYSTEM_PROMPT, agent_node, tools_node

def route_after_agent(state: ScoutState) -> str:
    """Routing is a graph concern: nodes never decide who runs next."""
    llm_turns = sum(m["role"] == "assistant" for m in state["messages"])
    last = state["messages"][-1]
    if last.get("tool_calls") and llm_turns < config.MAX_ITERATIONS:
        return "tools"
    return END

def build_graph():
    builder = StateGraph(ScoutState)
    builder.add_node("agent", agent_node)   # canonical name: the future
    builder.add_node("tools", tools_node)   # base of the M7 specialists
    builder.add_edge(START, "agent")
    builder.add_conditional_edges("agent", route_after_agent, {"tools": "tools", END: END})
    builder.add_edge("tools", "agent")
    return builder.compile()

build_graph() becomes the single public constructor — tests call it, Module 10’s API will call it, nothing ever wires its own copy. One hard rule, enforced by the smoke tests: ScoutState gains no fields. The refactor reorganizes; it does not extend. If your refactor needs a new field, your refactor is wrong.

Untouched, deliberately: state.py (frozen contract), llm.py, ask.py, and react_manual.py — byte-identical as a Module 2 comparison piece. A museum exhibit; the living architecture reads config.py.

Step 3 — Verify: same behavior, better bones

cd module-03
uv run python -m scout.graph "What did NVIDIA announce at GTC 2026?"
uv run pytest module-01/tests/ module-02/tests/ module-03/tests/  # from repo root

Same trace as Module 2, all three suites green — Module 3’s tests pin the new layout (nodes/graph separated, parameters only in config, design doc filled) and the non-change (state untouched, inherited files byte-identical). A refactor that breaks an old test changed behavior somewhere.

Try it yourself (no solution provided):

Stress the evolution path on paper: list which files Module 4’s Planner will touch (a planner node, plan + plan_iterations fields). If your answer edits an existing function in nodes.py, revisit step 2.
Add one row to your design doc’s rejection table — a hierarchy of supervisors — and reject it for today’s Scout in two sentences, in the grid’s vocabulary.

In production

The design doc you just wrote has a corporate name: an Architecture Decision Record (ADR) — same anatomy, kept in the repo, superseded rather than edited. Mature teams gate agent projects on a design review before the first line of code, for an unsentimental reason: an agent architecture refactored late is a triple rewrite — the state schema, every prompt that references it, and the eval suite that scored the old behavior. I’ve watched a “quick pivot” from single-agent to supervisor consume six weeks for exactly that reason. In regulated sectors, auditability isn’t a preference — “show me why the system decided this” is a legal requirement, and architectures that can’t answer don’t ship. My working rule, in numbers: no system grows past three agents until an automated eval harness exists (Scout’s arrives in Module 8). Past that point you’re scaling quality you can’t measure — which means scaling failure modes you can’t see.

Exam corner

What the exam tests here. Per the official study guide, Domain 1 (15%) expects you to: implement reasoning-and-action frameworks like ReAct (1.2 — built in Module 2, placed in the taxonomy here); orchestrate multi-agent workflows (1.5 — supervisor vs. hierarchy vs. swarm); apply logic trees, prompt chains, and stateful orchestration (1.6 — and know when they beat an agent); integrate knowledge graphs for relational reasoning (1.7); and ensure the architecture’s adaptability and scalability (1.8). Questions are scenarios: constraints in, “choose the best architecture” out.

Quiz — answers after question 5.

Every Monday, a system pulls last week’s sales records, validates them against three fixed rules, and produces a report. The steps never vary. Best architecture?
- A) A single ReAct agent with data-access tools, to adapt to data issues
- B) A supervisor with extraction, validation, and reporting specialists
- C) A prompt chain: extract → validate → report, orchestrated by code
- D) A swarm, so validation and reporting can hand off flexibly
A bank’s loan-document system has distinct stages (classify, extract, cross-check, summarize); compliance requires a centralized audit trail of every decision and handoff. Best topology?
- A) A swarm — peer handoffs maximize flexibility between stages
- B) A supervisor routing to stage specialists, all handoffs through one point
- C) A single agent with 25 tools covering all four stages
- D) Four independent chains in parallel, no coordination
A team built five agents for FAQ answering: query, rephrase, search, answer, checker. It’s slow, expensive, and wrong answers are hard to localize — each agent builds on the previous one’s output. Diagnosis?
- A) Too few agents — a sixth should verify the checker
- B) Premature multi-agent: sequential specialists that never diverge are a pipeline, and errors compound across hops — collapse to a router plus retrieval, or a single agent
- C) Wrong framework — swapping frameworks would fix the latency
- D) The model is too small for five agents
An assistant must answer within a 3-second SLA on a fixed token budget; the task is open-ended within one domain and needs 3 tools. Which fits?
- A) A supervisor with three single-tool specialists, for cleaner separation
- B) A hierarchy, so future growth is already structured
- C) A single ReAct agent with the 3 tools and a tight iteration cap
- D) A swarm of two agents to parallelize tool calls
A system answers “which downstream services are affected if component X fails?” over thousands of interconnected components — multi-hop traversal across explicit dependency relationships. Which knowledge foundation?
- A) A vector store — embeddings capture the dependencies implicitly
- B) A bigger context window — paste the full dependency list per query
- C) Fine-tuning the model on the dependency data
- D) A knowledge graph — typed relationships support multi-hop relational reasoning that similarity search can’t follow

Answers. 1 — C. Fixed, enumerable steps are the signature of a workflow: deterministic, testable, three calls every time. A and B re-introduce variable cost and non-determinism for zero benefit — the trap is that they sound more capable. D is this module’s opening anecdote. 2 — B. “Centralized audit trail” decides it: a supervisor passes every handoff through one loggable point. A swarm scatters control — the worst audit story of all. C saturates one context with 25 tools; D has no end-to-end trail at all. 3 — B. Five sequential specialists whose routing never varies are a prompt chain wearing agent costumes — five compounding error points, five context bills. The fix is simplification, not another checking layer (A adds one more error point). 4 — C. The SLA and budget eliminate every multi-agent option — each supervisor hop bills a routing call. One domain + 3 tools is single-agent territory; the iteration cap makes worst-case latency predictable. 5 — D. Dependency questions are relational, multi-hop reasoning — you follow typed edges, not similarity. Vectors find similar text, not connected entities; B doesn’t scale; C bakes today’s dependencies into stale weights.

Traps to avoid:

The impressive-architecture trap. Scenarios often include a plausible multi-agent option satisfying most constraints. The intended answer satisfies all of them, as simply as possible — complexity is never a tiebreak in its own favor.
Supervisor vs. orchestration. Supervisor = the coordinating agent. Orchestration = the concept of coordinating multi-agent workflows. There’s no “orchestrator agent” in the blueprint’s vocabulary.
“It calls an LLM, so it’s an agent.” An agent is defined by the LLM controlling the flow — looping, choosing tools, deciding when to stop. An LLM call inside an if is a workflow step. “Agentic” systems described as fixed pipelines are testing exactly this boundary.
“Scalable architecture” ≠ handles more load. Objective 1.8’s scalability means absorbing the next capability without a rewrite — a new specialist node, not more requests per second. Load belongs to Domain 4.
Vectors find similar text; knowledge graphs follow typed edges. A multi-hop, relational question (“which services depend on…”) is a graph question — no amount of top-k tuning makes similarity search hop.

Key takeaways

Workflow first, agent if necessary: if you can write the steps down in advance, a chain or router beats an agent on cost, latency, determinism, and debuggability.
An agent is defined by who controls the flow — the LLM looping and choosing actions — not by the presence of an LLM call.
Know the six patterns and their trade-offs — and that supervisor wins when auditability matters.
Every added agent adds latency, cost, and a failure mode — LLM error rates compose across dependent steps, and handoffs compound them.
Reactive systems act on observations (ReAct); deliberative systems plan first; hybrids layer both — exam vocabulary, straight from the job description.
Multi-hop relational questions need a knowledge graph; “find relevant passages” needs a vector store — a decision made before any RAG code exists.
The exam rewards the simplest architecture that meets all the constraints — and a design doc with rejected alternatives is the artifact that scales.

Keep going

Want the full NCP-AAI question bank (150+ exam-style questions) and the next module in your inbox? Subscribe here — it’s free, like everything in this series.

Scout has an architecture on paper; next, we give it a brain — planning, reasoning, and self-correction.

Lab code · Course index · ← Module 2 · Module 4 →

References

What Are Multi-Agent Systems? — NVIDIA glossary; official study-guide reading for Domain 1, with the centralized/decentralized/hierarchical orchestration comparison.
Agentic AI in the Factory — NVIDIA white paper; official study-guide reading on agentic workflows as production services.
Workflows and agents — LangGraph’s guide to prompt chaining, routing, and agents (1.x), with runnable examples of every pattern in this module.
Building Effective AI Agents — Anthropic’s essay behind the workflows-vs-agents framing and the “start simple” discipline.
NCP-AAI certification page — the official blueprint; Agent Architecture and Design is weighted at 15%.