What Is Agentic AI? From LLM Calls to Autonomous Agents (NCP-AAI Module 1)

This is Module 1 of NCP-AAI Mastery, a free 14-module course that takes you from your first agent to NVIDIA-certified. Start at Module 1 or browse the full syllabus.

Somewhere right now, a team is shipping three chained prompts and calling it an “agent.” Across the hall, a developer who wants to build a real one is staring at six frameworks, two interop protocols, and a wall of vendor blogs, with no grid to tell them apart. The word “agent” is everywhere, which means it no longer means much — and yet the NCP-AAI exam tests precise distinctions: workflow or agent, reactive or deliberative, MCP or A2A.

Preparation resources for that exam are scarce, mostly paid, and mostly shallow question dumps. This course takes the other road: free, code-first, covering 100% of the official blueprint, written by an engineer who took the exam. By the end of this module, you’ll say exactly what separates an LLM call, a workflow, an agent, and a multi-agent system — and you’ll have made your first Nemotron call, the building block everything else stands on.

In this module

You’ll learn:
- Distinguish an LLM call, an LLM workflow, an agent, and a multi-agent system — and classify any system you’re shown.
- Identify the four capabilities that make a system agentic: reasoning, tool calling, memory, and autonomy.
- Map the 2026 agentic landscape: frameworks, the MCP and A2A protocols, and the NVIDIA stack (NIM, Nemotron, NeMo).
- Explain what the NCP-AAI certification validates and how this course covers 100% of its blueprint.
You’ll build: Your dev environment plus Scout’s very first building block: a question-to-answer call to Nemotron through a hosted NIM endpoint — no agent yet, on purpose.
Exam domains covered: D1 — Agent Architecture and Design — 15% of the exam (foundations; Modules 3 and 7 complete it).
Prerequisites: Python 3.12 (uv installs it for you), comfort with a terminal, an LLM API called at least once. No agent experience required. You’ll create a free build.nvidia.com account during the lab.

The series: one system, fourteen modules

The promise of this course fits in one sentence: build a production-grade multi-agent system, module by module, and walk into the NCP-AAI exam ready.

The system has a name. Scout is a multi-agent research assistant: give it a question, approve its research plan, and it searches the web, reads and cross-checks sources, then writes a cited report — with full tracing, evals, and guardrails.

Here is where Scout ends up:

flowchart TB
    Q[User question] --> PL[Planner]
    PL --> H{Plan approved?<br/>human-in-the-loop}
    H -- approved --> S[Searcher]
    S --> RD[Reader]
    RD --> FC[Fact-checker]
    FC --> W[Writer]
    W --> R[Cited report]
    subgraph X[Cross-cutting, added module by module]
        M[(Memory)]
        RAG[(RAG store)]
        T[Tracing + evals]
        G[Guardrails]
    end

Scout’s target architecture — you’ll build this over 13 modules.

The path, one increment per module (each opens with the exam domains it covers):

👉 Module 1 — What Is Agentic AI? — vocabulary, the 2026 landscape, your first NIM call (you are here)
⬜ Module 2 — Build Your First AI Agent — the agent loop by hand, then in LangGraph
⬜ Module 3 — Agent Architecture — patterns, trade-offs, when not to build an agent
⬜ Module 4 — Cognition — how agents plan, reason, and self-correct
⬜ Module 5 — Agent Memory — state, persistence, long-term recall
⬜ Module 6 — Knowledge Integration — RAG pipelines for agents
⬜ Module 7 — Multi-Agent Systems — supervisors, swarms, and handoffs
⬜ Module 8 — Evaluating AI Agents — metrics, LLM-as-judge, tuning
⬜ Module 9 — Guardrails and Human Oversight — safe agents by design
⬜ Module 10 — Deploying AI Agents — from notebook to production API
⬜ Module 11 — Running Agents in Production — observability, cost, maintenance
⬜ Module 12 — The NVIDIA Agentic Stack — NIM, NeMo, and Nemotron in practice
⬜ Module 13 — Capstone — ship Scout v1.0
⬜ Module 14 — The NCP-AAI Exam — strategy, mock exam, and my debrief

The code accumulates: the LLM client you write in today’s lab is the one Scout’s agent team still uses in Module 13.

From LLM Calls to Autonomous Agents: A Spectrum of Control

Every system you’ll be shown — in a design review or an exam question — sits on a four-rung ladder, and one question sorts them all: who owns the control flow? Your code decides the next step: workflow. The model decides: agent. That single question answers more exam scenarios than any other idea in this course.

Single LLM call. One prompt in, one completion out. No steps to own, so the control-flow question doesn’t arise. Most language tasks need nothing more — summarize an email, extract fields, rewrite a paragraph.

LLM workflow. A chain or router where your code fixes the sequence of steps and LLM calls fill in the slots. It exists because real tasks usually take several steps, and a hardcoded pipeline is predictable, cheap, and easy to debug. Example: a support pipeline that always summarizes the ticket, routes it by product line, then drafts a reply.

Agent. An agent (in exam parlance, also an agentic workflow) is a system where the model decides the next step in a loop toward a goal: which action, with what input, and whether it’s done. It exists because some tasks can’t be enumerated in advance — you don’t know step 3 until you’ve seen the result of step 2. Example: a research assistant that decides what to search next based on what the last search returned.

Multi-agent system. A multi-agent system is several specialized agents coordinating on one goal. It exists for specialization (a focused prompt beats a do-everything prompt), parallelism, and context isolation. Example: a searcher finds sources, a reader extracts claims, a fact-checker verifies them, a writer assembles the report — exactly what Scout becomes in Module 7.

Rung	Who decides the next step	Predictability	Cost / latency	Debuggability	Example
Single LLM call	Nobody — one step	High	Lowest	Easy	Summarize an email
LLM workflow	Your code	High	Low, fixed	Easy — steps are enumerable	Fixed support-ticket pipeline
Agent	The model, in a loop	Medium–low	Variable; unbounded without caps	Harder — every run takes a different path	Research assistant choosing its searches
Multi-agent system	Several models plus a coordination layer	Lowest	Highest	Hardest — interactions emerge at runtime	Specialized team producing a cited report

Each rung up buys flexibility and pays in predictability, cost, and debuggability — a trade-off that becomes a full decision framework in Module 3.

What appears at the agent rung — and only there — is the loop:

flowchart LR
    A[Single LLM call] --> B[LLM workflow<br/>code owns control flow]
    B --> C[Agent<br/>model owns control flow]
    C --> D[Multi-agent system]
    subgraph L[The agentic loop]
        P[Perceive] --> Rn[Reason] --> Ac[Act] --> O[Observe] --> P
    end
    C -.-> L

The agent perceives its situation, reasons about it, acts (usually by calling a tool), observes the result, and goes around again until the goal is met. You’ll build that loop yourself in Module 2.

The Four Capabilities of an Agent

Strip any agent to its skeleton and you find the same four capabilities. The exam expects you to name them, spot which one a described system is missing, and know what each costs.

Reasoning is the ability to plan and decompose: take a goal, break it into steps, revise the steps when reality disagrees. The canonical pattern here is ReAct — a reasoning-and-action framework where the model alternates between thinking out loud and taking an action, each thought conditioned on the last observation. You’ll implement it from scratch in Module 2.

Tool calling is the structured mechanism by which a model requests an action — a web search, an API call, a database query — and gets the result back as context. Without tools, a model can only talk about the world; with them, it can act on it.

Memory is what the agent knows beyond the current prompt. Short-term memory is the working state of one session: the conversation so far, intermediate results. Long-term memory survives across sessions: user preferences, accumulated facts. Module 5 builds both into Scout.

Autonomy is the degree to which the system acts without a human signing off each step. It’s a dial, not a switch — and mature systems keep a human in the loop at the decisions that matter. Scout will pause and ask you to approve its research plan before spending tokens executing it (Module 9).

Each capability maps to a concrete piece of Scout:

Capability	Built in	What Scout gains
Reasoning	Modules 2 and 4	The agent loop, then a Planner that turns questions into research plans
Tool calling	Module 2	`web_search` — Scout’s first tool
Memory	Module 5	Resumable sessions and user preferences that survive restarts
Autonomy	Modules 7 and 9	A full agent team, with a human approving the plan before it runs

Reactive, deliberative, hybrid

One more piece of vocabulary, because the exam uses it: per the official study guide, the agentic AI professional must master reactive, deliberative, and hybrid systems. A reactive system maps perception directly to action — no internal model, no planning, fast and predictable. A deliberative system builds an internal representation and plans several steps before acting. A hybrid system layers deliberative planning over reactive execution — what most production agents, Scout included, end up being.

The 2026 Agentic Landscape: Frameworks, Protocols, and the NVIDIA Stack

You have the vocabulary; here’s the map — first the frameworks you’d build an agent with:

Framework	Identity in one line	Consider it when
LangGraph	Explicit state graphs; the de facto production standard	You want full control over the control flow — our choice for this course
CrewAI	Role-based “crews” of agents	You’re prototyping a role-playing team fast
OpenAI Agents SDK	Lightweight and young (still 0.x)	You’re all-in on the OpenAI ecosystem
Google ADK	Agent toolkit with A2A interop built in	You’re building in the Google ecosystem
PydanticAI	Type-safe agents, Pydantic developer experience	You want validation-first plumbing
smolagents	Minimalist, from Hugging Face	You want tiny code-first agents

One absence from that table: AutoGen is retired — Microsoft replaced it with the Microsoft Agent Framework, and AG2 survives as a community fork. If a tutorial teaches AutoGen, check its date.

Why LangGraph for this course? An explicit state graph makes every node and edge — every decision about who owns the control flow — visible on the page, exactly what you need while learning. And it’s what the industry actually deploys, so nothing you practice here is a toy dialect.

Frameworks build one agent system; protocols let systems interoperate. Two matter in 2026. MCP (Model Context Protocol) is the open standard for connecting an agent to tools and data sources, governed by the Linux Foundation’s Agentic AI Foundation since late 2025. A2A (Agent2Agent) is the open standard for making independent agents — different vendors, different frameworks — talk to each other. Keep the distinction sharp, because the exam does: MCP plugs tools and context into an agent; A2A connects agents to each other. Both get full treatment in Module 7 — MCP with working code, A2A at the concept level the exam tests.

Finally, the platform layer — the reason “NVIDIA” is in this certification’s name. NVIDIA Inference Microservices (NIM) are packaged model endpoints: the same container and API whether NVIDIA hosts the model or you self-host it on your own GPUs. Nemotron is NVIDIA’s open model family, now in its third generation (December 2025) with nano, super, and ultra variants tuned for agentic workloads. NeMo is the umbrella for the agent-lifecycle tooling — Guardrails, Retriever, Evaluator, and friends — and the NeMo Agent Toolkit profiles and optimizes agent workflows across frameworks. At GTC 2026, NVIDIA also launched the Nemotron Coalition — Mistral AI, LangChain, Cursor, and Perplexity among the labs co-developing the next open Nemotron generation. The deep dive is Module 12; what matters today: this course uses hosted NIM endpoints from Module 1 on, so you’ll touch the exam’s NVIDIA Platform domain in every lab.

The NCP-AAI Certification — and How This Course Gets You There

The credential’s full name is NVIDIA-Certified Professional: Agentic AI (NCP-AAI). The facts, all as of June 2026, per the official certification page: 60–70 questions in 120 minutes, $200, online proctored through Certiverse, English only, valid two years. The passing score is not published — any “70%” threshold you read on a third-party site is speculation.

The blueprint has ten domains. Here’s how this course covers them:

#	Exam domain	Weight	Covered in
D1	Agent Architecture and Design	15%	Module 1 (here), 3, 7
D2	Agent Development	15%	Modules 2, 7, 12 — plus every lab
D3	Evaluation and Tuning	13%	Module 8
D4	Deployment and Scaling	13%	Module 10
D5	Cognition, Planning, and Memory	10%	Modules 4, 5
D6	Knowledge Integration and Data Handling	10%	Module 6
D7	NVIDIA Platform Implementation	7%	Module 12 — and NIM in every lab from today
D8	Run, Monitor, and Maintain	5%	Module 11
D9	Safety, Ethics, and Compliance	5%	Module 9
D10	Human-AI Interaction and Oversight	5%	Modules 9, 13

Weights follow the official certification web page as of June 2026; the PDF study guide lists slightly different numbers for D4 and D8 — Module 14 deals with that wrinkle.

The “Professional” in the title is real: NVIDIA recommends 1–2 years of AI/ML experience and hands-on agentic projects. That’s why this course is built around Scout rather than flashcards — by Module 13 you’ll have a production-shaped agentic project, not just vocabulary about one.

How to work the series: one module is a ~12-minute read, a lab, and a 5-question exam-style quiz. The code accumulates strictly in order, so do the labs in sequence. The full mock exam lands in Module 14, after I’ve sat the real thing.

Hands-On Lab: Your Environment and Your First Nemotron Call

Objective: a ready dev environment and your first Nemotron call through a hosted NIM endpoint — the LLM building block every later module reuses.

Observable result: at the end, uv run python -m scout.ask "What is agentic AI?" prints a Nemotron answer in your terminal, and the module’s smoke tests pass.

This lab builds no agent — no loop, no tools, no state. That’s the point: you need to feel the bare LLM call before Module 2 gives it agency.

Step 1 — Create your build.nvidia.com account and API key

Sign up at build.nvidia.com — free, no credit card, free inference credits at signup (more than enough for every lab in this course). Open any model page, click Get API Key, and copy the key (it starts with nvapi-). It goes in exactly one place — a .env file — never in code, never in git.

Step 2 — Clone the labs repo and configure the environment

You need uv, which manages Python 3.12 for you:

git clone https://github.com/dupuis1212/agentic-ai-course-labs.git
cd agentic-ai-course-labs
uv sync                  # installs the pinned stack from uv.lock
cp .env.example .env     # then paste your key in

Your .env contains one line: NVIDIA_API_KEY=nvapi-.... It’s gitignored; the committed .env.example is the template. It lives at the repo root and stays there — one .env serves every module, so don’t create per-module copies.

Step 3 — `config.py`: one home for the model name

The scout/ package is born here, and its first file is the most boring and most important:

# module-01/scout/config.py
MODEL = "nvidia/nemotron-3-nano-30b-a3b"
BASE_URL = "https://integrate.api.nvidia.com/v1"
API_KEY_ENV = "NVIDIA_API_KEY"

# Plan B if build.nvidia.com quotas ever tighten: OpenRouter serves
# Nemotron 3 in :free variants. Swap these three values; nothing else moves.
# MODEL = "nvidia/nemotron-3-nano-30b-a3b:free"
# BASE_URL = "https://openrouter.ai/api/v1"
# API_KEY_ENV = "OPENROUTER_API_KEY"

This is the only place in the whole course where the model name lives — a frozen contract every later module imports, enforced by one of the smoke tests. config.py also owns load_env(), the ten-line hand-rolled .env loader — no python-dotenv dependency, so you can read exactly what touches your environment.

Step 4 — `llm.py`: the client, with a 429 backoff

NIM endpoints speak the OpenAI API, so the client is the standard openai SDK pointed at BASE_URL. The completion helper adds the two things you’d otherwise write on day two:

# module-01/scout/llm.py (core — full file in the repo)
DEFAULT_SYSTEM_PROMPT = (
    "You are Scout, a research assistant. Answer factually and concisely."
)

def complete(question: str, system_prompt: str = DEFAULT_SYSTEM_PROMPT,
             max_tokens: int = 1024) -> str:
    client = get_client()
    for attempt in range(MAX_RETRIES + 1):
        try:
            response = client.chat.completions.create(
                model=config.MODEL,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": question},
                ],
                max_tokens=max_tokens,
            )
        except RateLimitError:
            if attempt == MAX_RETRIES:
                raise
            # NIM free tier: 40 req/min — back off here
            time.sleep(2**attempt)
            continue
        content = response.choices[0].message.content
        if not content:
            raise RuntimeError(
                "Empty completion "
                f"(finish_reason={response.choices[0].finish_reason}) — "
                "raise max_tokens."
            )
        return content

Two constraints worth naming. First, the free tier allows 40 requests per minute per account (as of June 2026), hence the exponential backoff on HTTP 429 — and the retry covers rate limits only, on purpose: in a lab, a network failure should crash loudly, not hide behind a catch-all. Second, Nemotron 3 is a reasoning model: it thinks before it answers, and those hidden reasoning tokens count against max_tokens. Set the budget too low and the entire allowance goes to reasoning — content comes back empty with finish_reason="length". Keep max_tokens at 512 or more, and fail loudly. One dial we deliberately leave alone: temperature, the sampling-randomness knob — the NIM default serves for the whole course; Module 8 shows when that dial becomes a tuning lever (and pins its judge at temperature 0).

Step 5 — Ask your first question

scout/ask.py is a twenty-line CLI around complete() — short enough to show whole:

# module-01/scout/ask.py — the entire file
"""Scout, day one: ask a question, print a Nemotron answer. No agent yet.

Usage:
    uv run python -m scout.ask "What is agentic AI?"
"""

import sys

from . import llm


def main() -> None:
    if len(sys.argv) < 2:
        print('usage: uv run python -m scout.ask "your question"')
        raise SystemExit(2)
    question = " ".join(sys.argv[1:])
    print(llm.complete(question))


if __name__ == "__main__":
    main()

Run it:

cd module-01
uv run python -m scout.ask "What is agentic AI?"

**Agentic AI** refers to artificial-intelligence systems that can autonomously
set goals, plan actions, and execute them in their environment without
continuous human supervision—exhibiting what is often described as "agency."
Unlike passive tools that merely respond to inputs, agentic AI can:

1. **Formulate sub-goals** toward a higher-level objective.
...

(The raw ** asterisks are normal: the model emits Markdown, and your terminal shows it as plain text.)

That’s Scout v0: question in, answer out, and the model decided nothing except the words. Remember how this feels — in Module 2 the same model starts choosing actions, and the difference is unmistakable.

Step 6 — Run the smoke tests

cd ..                                                # back to the repo root
uv run pytest module-01/tests/                       # offline: contract + imports
SCOUT_LIVE_TESTS=1 uv run pytest module-01/tests/    # + 1 real API call

The convention holds for every module: tests that hit the network carry a live marker, are skipped by default, activate with SCOUT_LIVE_TESTS=1, and skip cleanly when NVIDIA_API_KEY is missing. CI stays green and free; your laptop proves the real thing.

If something fails, it’s almost always one of three things:

401 Unauthorized — the key isn’t reaching the client. Check that .env sits at the repo root and the line reads NVIDIA_API_KEY=nvapi-... — no quotes, no spaces.
404 Not Found — the model ID doesn’t match what the endpoint serves. Copy it back from config.py verbatim; the offline tests pin it.
RuntimeError: Empty completion — reasoning tokens ate the budget. Reread Step 4 and raise max_tokens.

Try it yourself (no solution provided — that’s the point):

Point config.py at nvidia/nemotron-3-super-120b-a12b, ask the same question, and compare tone and latency. Then switch back (forget, and the offline tests — which pin the nano ID — will remind you).
Add a --system option to scout/ask.py that overrides the system prompt, and see how much one sentence changes the answer. (argparse replaces the sys.argv parsing in a few lines — the exercise is the prompt, not the flag plumbing.)

In production

What changes when this stops being a lab? Four things. Capacity: the free tier’s 40 req/min per account is fine for one developer, but a product needs dedicated capacity — or a self-hosted NIM, which is the same container and API on your own GPUs (Module 12). Secrets: .env is a dev convenience, not a pattern; production keys live in a secret manager, rotate on a schedule, and never appear in code, images, or logs. Model versions: a hosted model can change under your feet — pin what you can and re-run your evals on every swap (that’s Module 8’s job). Configuration: this is why the model name lives in config.py from day one. I treat it like a database connection string — configuration, never code. When (not if) you swap models, it’s a one-line diff plus an eval run, instead of a grep through the codebase at 11 pm.

Exam corner

What the exam tests here. Per the official blueprint, D1 (Agent Architecture and Design) at the foundations level: classify a described system as call, workflow, agent, or multi-agent; use the reactive / deliberative / hybrid vocabulary correctly; and know which interop problem MCP solves versus A2A. To be honest about coverage: this module gives you D1’s vocabulary — the architecture-pattern objectives (logic trees, stateful orchestration, knowledge graphs, scalability) arrive in Module 3, and multi-agent coordination in Module 7.

Quiz — answers after question 5.

A pipeline processes every incoming ticket the same way: an LLM call summarizes it, an API call fetches the customer’s account, a second LLM call drafts a reply. The order never changes. What is this system?
- A) An autonomous agent — it uses both an LLM and a tool
- B) A multi-agent system — it has two LLM calls
- C) An LLM workflow — the code owns the control flow
- D) A reactive agent
A research assistant reasons well and calls tools competently, but every session starts from zero: users re-explain their project and preferences each time. Which agent capability is missing, and what does its absence cost?
- A) Reasoning — it can’t decompose tasks
- B) Tool calling — it can’t reach external data
- C) Autonomy — it needs human approval for each step
- D) Memory — nothing persists across sessions, so personalization is impossible
An incident bot pages the on-call engineer the instant a metric crosses a threshold. No plan, no model of the situation — stimulus, then action. Per the study guide’s vocabulary, this system is:
- A) Deliberative
- B) Reactive
- C) Hybrid
- D) Agentic
A team wants (a) a standard way to plug internal databases and tools into their agent, and (b) their agent to collaborate with a partner company’s agent built on a different framework. Which protocols apply?
- A) MCP for both
- B) A2A for both
- C) MCP for (a), A2A for (b)
- D) A2A for (a), MCP for (b)
Which use case most justifies an agent rather than a workflow?
- A) Converting a nightly batch of invoices to CSV through the same four steps
- B) An open-ended competitive analysis where relevant sources and follow-up questions can’t be known in advance
- C) Translating a fixed set of support macros into five languages
- D) Routing emails into five folders based on sender domain

Answers. 1 — C. Two LLM calls and a tool call, but the code fixes the sequence: that’s a workflow. (A is the exam’s favorite bait; D misuses “reactive,” which describes an architecture style, not “code that calls an API.”) 2 — D. Reasoning and tool calling are demonstrably present; what’s described is the absence of long-term memory. 3 — B. Direct perception-to-action mapping with no internal planning is the definition of reactive. Add a planning layer on top and you’d have a hybrid. 4 — C. MCP standardizes agent-to-tools/data connections; A2A standardizes agent-to-agent collaboration. They’re complementary, not interchangeable. 5 — B. When steps can’t be enumerated upfront, the model must own the control flow — that’s what an agent is for. A, C, and D are enumerable: workflows beat agents there on cost, predictability, and debuggability.

Traps to avoid:

Tool calling ≠ agent. A function call in a hardcoded flow is automation. The exam loves this distinction; expect at least one scenario built on it.
MCP and A2A are not interchangeable. MCP: agent ↔ tools and context. A2A: agent ↔ agent. Pick by which boundary the problem sits on.
“Multi-agent is always better” is false. Every rung up the autonomy ladder costs predictability, money, and debuggability — the spectrum table is the reference.

Key takeaways

The sorting question for any system: who owns the control flow — your code (workflow) or the model (agent)?
The spectrum runs call → workflow → agent → multi-agent; each rung buys flexibility and pays in predictability, cost, and debuggability.
An agent stands on four capabilities: reasoning, tool calling, memory, and autonomy — exam questions often describe a system missing exactly one.
MCP connects an agent to tools and data; A2A connects agents to each other.
Reactive reacts, deliberative plans, hybrid layers both — that’s the study guide’s vocabulary, use it.
Workflow when you can, agent when you must — the full decision framework is Module 3.

Keep going

Want the full NCP-AAI question bank (150+ exam-style questions) and the next module in your inbox? Subscribe here — it’s free, like everything in this series.

Next, things get real: in Module 2 you’ll write the agent loop by hand — about 80 lines — then rebuild it in LangGraph and feel exactly what the framework buys you.

Lab code · Course index · Module 2 →

References

NCP-AAI certification page — exam format, domains, and weights; the official study guide (doc 4230000, SEP25) is linked from this page.
build.nvidia.com — hosted NIM endpoints and your free API key.
NVIDIA Nemotron — the Nemotron 3 model family.
What are Multi-Agent Systems? — NVIDIA Glossary — recommended reading for the exam’s architecture domain.
ReAct: Synergizing Reasoning and Acting in Language Models — the founding paper behind the pattern you’ll implement in Module 2.