Cognition: How Agents Plan, Reason, and Self-Correct (NCP-AAI Module 4)
This is Module 4 of NCP-AAI Mastery, a free 14-module course that takes you from your first agent to NVIDIA-certified. Start at Module 1 or browse the full syllabus.
Scout’s ReAct loop handles narrow questions well. So I gave it a broad one:
“Compare the EU and US regulatory approaches to AI agents.” It searched
“EU AI regulation” — reasonable. Then “EU AI Act agents”. Then “EU AI Act
autonomous systems”, and two more rewordings of the same search. Fourteen
tool calls later — several per turn — it hit the iteration cap and printed
the line you wrote in Module 2: Stopped after 6 iterations without a final answer. The transcript it left behind was deep on the EU side and
had given the US side a single search. At no point did anything in that
loop ask the question every
human researcher asks first: what are the parts of this task?
An agent that acts without a plan burns tokens on redundant work and silently misses entire halves of the job. This module closes that gap: by the end, Scout decomposes the question into a structured plan, critiques its own plan once, then executes — and you’ll measure exactly what that critique pass buys.
In this module
- You’ll learn:
- Apply reasoning frameworks — chain-of-thought and task decomposition — and what each one buys you (objective 5.2).
- Engineer a multi-step planning strategy: ReAct vs. plan-and-execute, chosen on latency, cost, and auditability (5.3).
- Implement a Planner node that turns a question into a structured, validated research plan living in graph state (5.3, 5.4).
- Build a one-iteration reflection loop and measure what it adds on five test questions (5.5).
- Budget reasoning: when deliberation is worth its token cost, and when it isn’t.
- You’ll build: Scout’s Planner node — question → structured research plan, self-critiqued once before execution.
- Exam domains covered: D5 — Cognition, Planning, and Memory — 10% of the exam. This module covers the planning half; Module 5 covers memory.
- Prerequisites: Modules 1–3 (your
refactored graph runs); NVIDIA API key configured, plus your Tavily key
from Module 2 —
compare_plans.pyis the only part that runs without the Tavily key (it still needs your NVIDIA key).
Where you are
- ✅ Module 1 — What Is Agentic AI? — vocabulary, landscape, first NIM call
- ✅ Module 2 — Build Your First AI Agent — ReAct loop, tool calling, first graph
- ✅ Module 3 — Agent Architecture — patterns, trade-offs, the design doc
- 👉 Module 4 — Cognition: planning, reasoning, self-correction (you are here)
- ⬜ Modules 5–14 — memory, RAG, multi-agent, evals, guardrails, deployment, the exam
Scout before: a single-agent ReAct loop, reorganized along the design
doc but still purely reactive — it acts on the last observation, nothing
more. Scout after: a planner → critic → planner → executor graph.
The plan is a typed object in state, criticized and revised once before a
single search runs.
From reactive to deliberative: why agents plan
Module 3 named the three system temperaments from the study guide’s job description; this module builds the second one. Per the official study guide, a reactive system decides its next action from the current observation — sense, act, repeat. A deliberative system constructs a multi-step plan toward a goal before executing it. A hybrid system layers both: deliberate planning above, reactive execution below — which is exactly where Scout lands today.
Why does reactivity fail on broad tasks? Every decision in a ReAct loop is local — picked from the transcript so far. Nothing in the loop owns the global shape of the task. On a narrow question, local decisions suffice; on “compare two regulatory regimes,” the loop optimizes each next search and never notices that the US half of the comparison is starving.
The fix is the oldest tool in engineering: task decomposition — breaking a complex task into smaller, independently verifiable subtasks before solving any of them. Decomposition buys three things:
- Coverage: enumerating the facets up front is the only reliable way to notice one is missing — before the run, not after.
- Focus: each subtask gets a context window about its job, instead of one crowded transcript juggling every thread at once.
- Verifiability: a subtask with its own expected output can be checked off — “found the 2025 EU enforcement actions” either happened or didn’t.
Decomposed steps are also what Scout’s specialists will divide between themselves when it becomes a team in Module 7.
Here’s the opening question, decomposed the way Scout’s Planner will learn to do it:
- Inventory the EU rules for AI agents — expected output: the instruments and their key obligations.
- Inventory the US approach — expected output: federal and state-level actions, and what they regulate.
- Compare enforcement and scope — expected output: a side-by-side of who enforces what, and on whom.
- Collect open disagreements — expected output: where the regimes conflict for an agent operating in both.
Four steps, each verifiable, jointly covering the question. The fourteen-call spiral never produced step 2.
Reasoning frameworks: chain-of-thought and beyond
The study guide files two reasoning frameworks under objective 5.2, and the exam expects you to keep them apart.
Chain-of-thought (CoT) is prompting a model to reason step by step in text before committing to an answer — reasoning only, no actions. It improves multi-step answers because each intermediate conclusion conditions the next, and it leaves a trace you can read. But nothing in a chain of thought touches the world: no search, no tool, no observation.
ReAct, which you built in Module 2,
interleaves that reasoning with tool actions and feeds each observation
back into the next thought. That’s the whole distinction, and it’s worth
one slow sentence because the exam loves it: CoT reasons; ReAct reasons
and acts. A transcript with Action: and Observation: lines is
ReAct; a transcript that’s pure Thought: from question to answer is
chain-of-thought, however impressive the thinking looks. A third family
from the blueprint, logic trees and prompt chains — branching and
fixed-sequence structures where your code controls the flow — was
Module 3’s territory (objective 1.6), and remains the right answer when
the steps are known in advance.
Reasoning isn’t free, and that bill has a name: the reasoning budget —
the tokens, latency, and money you allow a system to spend thinking before
it acts. You’ve been paying it since Module 1: Nemotron 3 Nano is a
reasoning model, its thinking tokens count against max_tokens, and the
labs run with MAX_TOKENS = 8192 because smaller budgets returned empty
answers with finish_reason="length" — the reasoning consumed everything.
The budget exists at the system level too: Scout’s deliberation phase
costs three extra LLM calls (draft, critique, revision) before a single
search runs. On a multi-faceted research question, those calls pay for
themselves in avoided redundant searches; on “what’s the capital of
Australia,” they’re pure waste. Budgeting reasoning is the skill — not
maximizing it.
Planning patterns: ReAct vs. plan-and-execute
Objective 5.3 asks you to engineer planning strategies for sequential and multi-step decision-making — in practice, to choose between two patterns and defend the choice.
You know the first: ReAct is interleaved planning — the agent plans exactly one step ahead, executes it, and lets the observation reshape everything that follows. The second is plan-and-execute: the agent drafts a complete multi-step plan up front, then executes the steps, returning to the planner only if something forces a change. Side by side:
flowchart TB
subgraph react ["ReAct — interleaved"]
direction TB
R1[reason: what next?] --> R2[act: one tool call]
R2 --> R3[observe the result]
R3 -->|not done| R1
R3 -->|done| R4([answer])
end
subgraph pae ["Plan-and-execute"]
direction TB
P1[plan the whole task] --> P2[execute steps 1..n]
P2 --> P3{step failed or<br/>surprised you?}
P3 -->|yes| P4[replan the remainder]
P4 --> P2
P3 -->|no| P5([answer])
end
Left: every observation can redirect the run. Right: the global shape is fixed first; observations only redirect through an explicit replan.
The reference table for scenario questions — read the constraint in the question, find the row that decides:
| Criterion | ReAct (interleaved) | Plan-and-execute | Hybrid (plan + replanning) |
|---|---|---|---|
| Time to first action | Immediate — first tool call in one LLM turn | Delayed — the full plan is drafted first | Delayed at start, adaptive after |
| Token cost | Variable; redundant work on broad tasks | Planning calls up front, then focused execution | Highest worst case: plan + execute + replans |
| Adaptability to surprises | Best — every observation can change course | Worst — a stale plan executes blindly | Good — replans on defined triggers |
| Auditability / traceability | One transcript; intent stays implicit | The plan is an explicit, reviewable artifact before any action | Plan plus recorded replans: full decision history |
| Human approval | Hard — nothing exists to approve before actions start | Natural — pause after planning, approve, execute | Natural at plan time; re-approve on replan |
| Typical use | Narrow, unpredictable tasks in one domain | Multi-faceted research and reports; compliance contexts | Long-running production agents in changing environments |
Two rows decide most exam scenarios. Auditability: a plan that exists before execution is a reviewable artifact — you can log it, diff it against what actually ran, and show an auditor why the system did what it did; a ReAct transcript only reveals intent after the fact. And human approval: you cannot approve what doesn’t exist yet. Plan-and-execute creates a natural pause between planning and acting; in Module 9, a human will approve Scout’s plan at exactly that seam.
Don’t over-rotate, though: plan-and-execute does not dominate ReAct “because it plans.” It pays its planning calls even when the task didn’t need them, and when reality diverges from the plan — a source is gone, a step’s premise was wrong — a pure plan-executor either executes nonsense or pays again to replan. Unpredictable, narrow tasks remain ReAct territory. That’s why production systems converge on the hybrid column, and why Scout becomes one today — hybrid in the study guide’s layered sense: deliberate plan above, reactive executor below. The table’s replanning triggers are the production upgrade described in the In-production note; today’s Scout freezes its plan after one revision.
One more exam term lives here. Coordinating a multi-step task through a
typed, inspectable state object — which steps exist, what each produced,
where the run currently stands — is stateful orchestration (objective
5.4 — the blueprint also files the term under 1.6, where Module 3 used it
as a workflow tool; 5.4 is the task-coordination facet you build here).
The plan can’t live in a prompt string or a local variable: it lives
in ScoutState, next to the messages, where every node, every test, and
every future module can read it. The payoff is concrete: when step 4 of 6
fails mid-run, a system that kept its plan and its per-step results in
state can replan the remainder; a system that didn’t starts over.
Self-correction: reflection loops and their limits
First drafts are mediocre — for plans as much as prose. Reflection is the self-correction pattern: a model (or a separate critic role) reviews an output and produces a revision informed by the critique. The actor/critic split matters: a model grades its own work generously, while a critic role with explicit instructions — find the gaps, the redundancies, the unverifiable steps, and do not rewrite, do not praise — produces critique you can act on. The Reflexion paper (see references) formalized the deeper point: verbal self-feedback works best when grounded in real signals from the environment, not just the model re-reading itself.
Scout reflects on the plan, not the final answer — deliberately. A plan is small, structured, and nothing has been spent executing it: a critique that adds a missing subtopic costs one cheap revision. The same critique after execution costs a re-run of every search the bad plan caused. Criticize upstream, where fixes are cheap.
How many iterations? One. The first critique catches the structural misses
— the absent subtopic, the two steps that are really one, the step whose
“expected output” nothing could verify. A second pass mostly rewords the
first. I cap reflection at one iteration in Scout — beyond that you’re
paying double for synonyms. Uncapped reflection fails in predictable ways:
self-congratulation (the critic blesses the draft and adds nothing),
paraphrase loops (each revision restates the last with fresher
vocabulary), and the unbounded version — “loop until the critic is
satisfied” — an infinite loop with an API bill, because the critic’s
satisfaction is not a stop criterion you control. Scout’s stop criterion
is a counter in graph state, plan_iterations, checked by a routing edge.
Here is the whole module as one graph — exactly what you build in the lab:
flowchart LR
Q([question]) --> P[planner]
P -->|"iteration 1: draft plan"| C[critic]
C -->|critique| P
P -->|"iteration 2: revised plan"| A[agent]
A -->|tool call| T[tools]
T -->|observation| A
A --> E([cited answer])
One planner node, visited twice: the conditional edge sends iteration 1 to the critic and iteration 2 to the executor. The executor is Module 2–3’s ReAct loop, untouched.
Hands-on lab: build it
Objective, in one sentence: add a Planner node that turns question into
a structured, Pydantic-validated research plan, add a one-iteration
reflection loop, and measure the improvement on five test questions.
The full lab lives in
module-04/
of the labs repo.
Observable result: uv run python -m scout.run "How did EU AI regulation evolve in 2025?" prints the draft plan, the critique, and the
revised plan, then runs the familiar ReAct executor with the plan in its
context. uv run python compare_plans.py prints a before/after table on
five questions.
Step 1 — The plan schemas (frozen from here on)
Everything starts with the contract. In scout/planner.py:
from pydantic import BaseModel, ValidationError
class PlanStep(BaseModel):
"""One verifiable unit of research work. Frozen course-wide in module 04."""
id: int
goal: str
search_queries: list[str]
expected_output: str
class ResearchPlan(BaseModel):
"""The Planner's structured output. Frozen course-wide in module 04."""
objective: str
steps: list[PlanStep]
open_questions: list[str]
(Pydantic, if it’s new to you, is the standard Python validation library:
a BaseModel parses JSON into a typed object, and a failed parse raises a
ValidationError naming every bad field.)
These two models are frozen for the rest of the course: the human who
approves a plan in Module 9 and the API that returns one in Module 10
consume exactly these fields. expected_output is the field that makes a
step verifiable — it names the evidence that completes the step, and
it’s the first thing the critic checks.
Step 2 — The state grows by exactly two fields
ScoutState follows the course’s field calendar — added to, never
reshaped. Module 4 adds the plan and the loop’s counter:
class ScoutState(TypedDict):
question: str
messages: Annotated[list, operator.add]
# Module 04: the Planner's structured output, and the reflection-loop
# counter that drives the planner -> critic -> planner routing.
plan: ResearchPlan | None
plan_iterations: int
plan_iterations in state — rather than a variable inside some function —
is stateful orchestration in one line: the loop’s progress is part of the
run’s inspectable record, readable by routing edges, tests, and you at
2 a.m. The smoke test pins the state to exactly these four fields.
Step 3 — The Planner node, and the error path you keep
The Planner prompts for JSON only and validates the response against the schema. The interesting part is the failure path — explicit, taught, and capped at one retry:
def _generate_plan(prompt: str) -> ResearchPlan:
raw = llm.complete(prompt, system_prompt=PLANNER_SYSTEM,
max_tokens=config.MAX_TOKENS)
try:
return ResearchPlan.model_validate_json(_extract_json(raw))
except ValidationError as exc:
retry = (f"{prompt}\n\nYour previous attempt failed validation:\n{exc}\n"
"Fix those exact errors and return ONLY the corrected JSON object.")
raw = llm.complete(retry, system_prompt=PLANNER_SYSTEM,
max_tokens=config.MAX_TOKENS)
return ResearchPlan.model_validate_json(_extract_json(raw))
The retry feeds the model the exact Pydantic error — models fix named
mistakes far more reliably than “try again.” A second failure raises and
stops the run: executing a malformed plan is worse than not executing.
(_extract_json slices the outermost {…} block — reasoning models like
to wrap JSON in prose. Some OpenAI-compatible endpoints accept a
JSON-schema response_format; support varies by model and provider, so
the lab takes the path that works everywhere.)
The node reads plan_iterations to pick its mode — 0 means draft;
otherwise revise against the critique found in the transcript — and every
pass returns plan_iterations + 1: that counter is what the routing edge
in Step 4 reads. On its final pass it appends the rendered plan to
messages: that’s the handover to the executor, which follows the plan as
context. The handover message also caps the executor at three broad web
searches — the plan says what to find, not how many searches to run, and
the whole run still has to fit under MAX_ITERATIONS = 6. Step-by-step
orchestration of individual plan steps is Module 7’s job.
Step 4 — The critic, and the loop that stops
The critic reads state["plan"] and writes a short, prefixed critique
message — it never rewrites the plan:
def critic_node(state: "ScoutState") -> dict:
critique = llm.complete(
f"Research question: {state['question']}\n\n"
f"Draft plan:\n{state['plan'].model_dump_json(indent=2)}",
system_prompt=CRITIC_SYSTEM,
max_tokens=config.MAX_TOKENS,
)
return {"messages": [{"role": "user",
"content": f"{CRITIQUE_PREFIX}\n{critique}"}]}
Why role: "user" for LLM-generated text? Because for the next LLM call, the
critique — like the plan handover in Step 3 — is an instruction to follow, not one
of the model’s own past turns. And the role is load-bearing: route_after_agent
counts only "assistant" turns against MAX_ITERATIONS = 6, so critique and
handover must not consume the executor’s budget.
That prefix is what makes the critique findable again: in revision mode,
the Planner locates Step 3’s “critique in the transcript” by taking the
latest message that starts with CRITIQUE_PREFIX. The critic’s system
prompt bans praise and demands 3–6 bullets naming missing subtopics,
redundant steps, and unverifiable expected outputs. The loop closes with
one conditional edge reading the counter:
def route_after_planner(state: ScoutState) -> str:
if state["plan_iterations"] < config.MAX_PLAN_ITERATIONS:
return "critic"
return "agent"
MAX_PLAN_ITERATIONS = 2 joins the other caps in config.py: the draft
plus exactly one revision. Never “until the critic is satisfied.”
Step 5 — Wire it and run it
build_graph() puts the deliberation in front of the untouched executor:
builder.add_edge(START, "planner")
builder.add_conditional_edges("planner", route_after_planner,
{"critic": "critic", "agent": "agent"})
builder.add_edge("critic", "planner")
# agent -> tools -> agent: the module-02/03 ReAct loop, unchanged
The CLI moves to scout/run.py, the canonical entry point from now on:
cd module-04
uv run python -m scout.run "How did EU AI regulation evolve in 2025?"
=== Plan v1 (draft) ===
Objective: Trace how EU AI regulation changed during 2025
1. Identify the AI Act milestones that took effect in 2025
queries: EU AI Act 2025 implementation timeline, ...
expected: dated list of provisions entering into force
...
=== Critique ===
Plan critique (address every point in your revision):
- No step covers enforcement actions, only the rules themselves
- Steps 2 and 3 overlap: both search for guidance documents
...
=== Plan v2 (revised) ===
...
[agent] tool_call: web_search({"query": "..."})
[tools] 5 results
...cited answer streams here...
Step 6 — Measure the reflection pass
Still inside module-04/:
uv run python compare_plans.py
The script runs draft → critique → revise on the five questions in
test_questions.json (no executor, no search key needed) and prints three
counting heuristics per plan — steps, distinct queries, steps with an
expected output — plus a guided inspection checklist:
Q | steps | distinct queries | steps w/ expected
---------------------------------------------------------
1 | 4 -> 5 | 7 -> 9 | 4 -> 5
2 | 4 -> 4 | 8 -> 8 | 4 -> 4
...
Numbers count; they don’t judge. Question 2’s unchanged row could be a
solid draft — or a critic that paraphrased. The checklist makes you look:
did the revision add a missing subtopic, merge redundant steps, sharpen
expected outputs — or just reword? There’s deliberately no LLM-as-judge
here; automated quality judging is Module 8’s topic. The budget is ~15 LLM
calls (the free tier allows 40 req/min as of June 2026; llm.complete()
backs off on 429).
Step 7 — Verify
cd .. # back to the repo root
uv run pytest module-04/tests/ # offline, no API calls
SCOUT_LIVE_TESTS=1 uv run pytest module-04/tests/ # + 1 real graph run
And the cumulative rule of the repo, from the root:
uv run pytest module-01/tests/ module-02/tests/ module-03/tests/ module-04/tests/
Try it yourself (no solution provided):
- Add a criticality signal: give
PlanStepan optionalcriticality: int = 1field in your copy and teach the Planner prompt to set it. The smoke test pinning the frozen schema will object — keep the experiment on a branch, and notice why the course freezes contracts. - Pay for a second opinion: set
MAX_PLAN_ITERATIONS = 3and re-runcompare_plans.py. Compare what v3 adds over v2 against what v2 added over v1, and what it cost. Diminishing returns, measured on your own runs. Same caveat as exercise 1: the smoke test pinsMAX_PLAN_ITERATIONS == 2, so branch and revert after.
Exam corner
What the exam tests here. Per the official study guide, Domain 5 (10%) expects you to: apply reasoning frameworks — chain-of-thought and task decomposition (5.2); engineer planning strategies for sequential and multi-step decision-making (5.3); manage stateful orchestration to coordinate complex tasks (5.4 — this module’s half; the “knowledge retention” half belongs to memory); and adapt reasoning strategies based on experience and feedback (5.5). Objective 5.1 — memory mechanisms — is the other half of this domain and is covered in Module 5. Questions are scenarios: a task with constraints in, “choose the right reasoning or planning approach” out.
Quiz — answers after question 5.
-
A pharma company’s research agent compiles regulatory submissions. Compliance requires that every research step be reviewed and signed off by a human before any data is gathered. Which approach fits?
- A) ReAct, so the agent can adapt its searches as it learns
- B) Plan-and-execute: draft the full plan up front, pause for review, then execute the approved steps
- C) Chain-of-thought prompting before each individual tool call
- D) More reflection iterations, so the plan improves itself before acting
-
An agent must research “the impact of remote work on commercial real estate, public transit, and downtown retail.” Which decomposition is correct?
- A) One step: “research the impact of remote work thoroughly”
- B) Three steps split by sub-domain, each with its own queries and a checkable expected output, plus a final synthesis step
- C) Steps split by tool: one step per search engine the agent can use
- D) Twelve steps, several of which repeat the same queries with different wording, to guarantee coverage
-
A team runs six self-critique iterations on every draft plan. Quality scores plateau after the first pass; cost roughly doubles with each additional one. What’s the right adjustment?
- A) Increase to ten iterations — quality will eventually improve
- B) Replace the critic with a larger, more expensive model
- C) Cap reflection at one iteration and ground further corrections in external signals (tool results, validation errors)
- D) Remove the critique entirely — reflection never adds value
-
A model’s output reads: “Thought: I need the 2025 figure. Action: web_search(‘EU AI Act fines 2025’). Observation: three results… Thought: now I can compare.” What is this?
- A) Chain-of-thought — the output shows explicit thoughts
- B) ReAct — reasoning interleaved with actions and observations
- C) Plan-and-execute — the steps were decided up front
- D) Reflection — the model is critiquing its own reasoning
-
A plan-and-execute agent fails at step 4 of 6: the source it needed is offline. To replan only the remaining work without redoing steps 1–3, what must the system have kept?
- A) Nothing — restarting from scratch is always cleaner
- B) Structured run state: the plan, which steps completed with their outputs, and the failure that interrupted step 4
- C) The original user prompt, which contains everything needed
- D) The model’s internal reasoning tokens from the failed step
Answers.
1 — B. “Sign off before any data is gathered” requires an artifact
that exists before execution — only plan-and-execute produces one. A
starts acting immediately, with nothing to approve; C reasons without
producing a reviewable plan; D improves a plan but never pauses for the
required human.
2 — B. Good decomposition produces independently verifiable subtasks
that jointly cover the question, plus synthesis. A isn’t a decomposition;
C splits by mechanics instead of meaning; D’s redundant steps are the
token-burning spiral this module opened with.
3 — C. Plateauing quality with compounding cost is the signature of
ungrounded reflection. The fix is the cap plus external feedback — A and B
pay more for the same plateau; D throws away the genuinely valuable first
pass.
4 — B. Action: and Observation: lines are the ReAct signature —
reasoning and acting. Pure chain-of-thought never touches a tool.
5 — B. Replanning the remainder requires the plan, the completed
steps with their results, and the failure — exactly what stateful
orchestration keeps in graph state. A wastes three steps of paid work; C
and D hold no record of execution progress.
Traps to avoid:
- CoT vs. ReAct. Chain-of-thought reasons; ReAct reasons and acts through tools. Visible thoughts don’t make a transcript ReAct — actions and observations do.
- “Plan-and-execute always beats ReAct because it plans.” It buys auditability and approval points, and pays for them in adaptability and upfront calls. On narrow, unpredictable tasks, interleaved ReAct wins.
- Reflection as fact-checker. Self-critique without external feedback cannot fix factual errors — the model only sees what it already knows. Reflection reshapes; tools, validation, and evals verify.
Key takeaways
- Reactive systems act on the current observation; deliberative systems plan first; hybrids — like Scout from today — plan deliberately and execute reactively.
- Decompose before acting: independently verifiable subtasks with expected outputs are what make coverage checkable and spirals avoidable.
- Chain-of-thought reasons without acting; ReAct interleaves reasoning with tool actions — the exam tests the boundary, transcripts in hand.
- ReAct is adaptive but opaque; plan-and-execute is auditable and approvable but rigid; production agents hybridize.
- Reflection earns its cost once: critique the plan (cheap to fix), cap the loop with a counter in state, and never loop “until satisfied.”
- The plan lives in graph state, not in a prompt — stateful orchestration is what makes replanning, approval, and debugging possible.
- Reasoning has a budget — thinking tokens and deliberation calls — and spending it is a decision, not a default.
Keep going
Want the full NCP-AAI question bank (150+ exam-style questions) and the next module in your inbox? Subscribe here — it’s free, like everything in this series.
Scout can now plan — but it forgets everything between sessions. Module 5 gives it memory: checkpointers, persistence, and long-term recall.
Lab code · Course index · ← Module 3 · Module 5 →
References
- Understanding the Planning of LLM Agents: A Survey — the survey behind this module’s taxonomy (task decomposition, plan selection, reflection); recommended reading in the official study guide for Domain 5.
- ReAct: Synergizing Reasoning and Acting in Language Models — the paper that defined interleaved reasoning and acting, and the vocabulary half this module’s comparisons stand on.
- Reflexion: Language Agents with Verbal Reinforcement Learning — self-correction through verbal feedback, and why grounding that feedback in real signals matters.
- LangGraph Graph API —
current (1.x) documentation for
StateGraph, state schemas, and the conditional edges the lab’s reflection loop is built from. - NCP-AAI certification page — the official blueprint; Cognition, Planning, and Memory is weighted at 10%.