A Visual Guide

The AI Agent Stack

From Models to Multi-Agent Systems

An AI agent is not a new kind of model. It is a model API call wrapped with instructions, context, tools, permissions, and a loop that keeps working until the task is done.

LLMs / Models Tool Use MCP Terminal Agents Multi-Agent
Scroll to explore
1.0 · The Core

The Model (LLM)

A Large Language Model is an external API — a remote inference service. On its own it is stateless: no memory, no file access, no ability to act.

The model sits on the right. Your code calls it. Everything else you see in this guide is the wrapper around that call.

1.1 · The Prompt

You Send a Prompt

You send text to the model API 1. The model receives your question as a sequence of tokens — nothing more.

1.2 · The Response

It Predicts a Reply

The model predicts the most likely next tokens 2. Text in, text out. No tools, no memory, no side effects.

StatelessEach API call is independent. The model knows facts, reasoning, and code patterns from training — but it cannot look things up or take actions.
2.0 · The Runtime

The Agent Runtime

Between you and the model sits the agent runtime — a local process that assembles context, calls the model API, and interprets what comes back.

At its core is the Execution Engine: the component that receives your request and coordinates everything that follows.

2.1 · The Request

Your Question Enters the Runtime

You ask: “What is the weather in Boston?” 1. This time the question goes to the Execution Engine — not directly to the model.

2.2 · Context

The Model Prompt

The Context Assembly layer 2 gathers system instructions, conversation history, and configuration into a single model prompt — and sends it to the model 2.

The model prompt is what the model actually sees. Context assembly is the process that builds it. Change what goes in, change the behavior.

Notice the History box inside the engine — it holds every user message and model response from this session. Each new prompt includes the full conversation so far, giving the model continuity across turns.

FramingThe same base model becomes an assistant, a reviewer, or a planner — depending on what the runtime puts in the model prompt.
2.3 · Tools

MCP · Universal Plugins

MCP servers expose tool schemas — the runtime injects these into the context assembly so the model knows what tools are available.

This is how the model discovers it can check the weather: the tool signature is right there in the prompt.

How models discover toolsTool schemas are text in the context. The model reads them and knows what it can ask for. MCP standardizes the format so connectors are portable across products.
2.4 · Tool Call

The Model Asks for Help

The model sees the weather tool in its context. Instead of guessing, it returns a structured tool call 4: weather("Boston").

This is not a function the model runs — it is a request sent back to the Execution Engine.

2.5 · MCP Call

The Engine Calls MCP

The Execution Engine routes the tool call to the right MCP server 5. The server executes the request and returns real data.

2.6 · Result

Data Back to Model

The tool result 6 is fed back into the context and sent to the model. The model now has facts it could not have known from training alone.

2.7 · The Answer

Grounded Response

With real data in context, the model produces a factual answer 7 and the runtime delivers it to you.

Execution BoundaryThe model decides what to do. The engine actually does it — locally, on your machine — then feeds the result back. This is where a chat UI becomes an agent.
3.0 · Instructions

The Instruction Layer

To the left of the runtime sits the instruction layer — project-level files that define identity, rules, and behavior. These are injected into the context assembly on every call.

The engine now also shows Memory — persistent storage that survives across sessions. While History resets when you start a new conversation, Memory lets the agent recall user preferences, project context, and past decisions indefinitely. The Context Assembly layer pulls in relevant memories and includes them in the Model Prompt, so the model can act on information from previous sessions.

3.1 · Persona

AGENTS.md

Instruction files 1 get loaded into context assembly. Claude Code uses CLAUDE.md. OpenAI Codex uses AGENTS.md. Cursor uses .cursor/rules.

These files define role, standards, workflow expectations, and guardrails. Swapping the instruction layer turns the same model into a security reviewer, staff engineer, or product writer.

Your Rules, Your FilesInstruction files live on your machine, not in the model. The runtime injects them into every prompt.
3.2 · Skills

Reusable Workflows

A skill registry 2 lists available workflows in the context — names and descriptions the model can read. When relevant, the model makes a tool call to load the full skill file into context.

On-Demand LoadingThe registry is text in the prompt. Loading a skill is a tool call — the content comes back as a tool result, just like any other tool.
3.3 · New Prompt

A Real Request

You ask: “Run my tests” 3. The runtime assembles instructions, tool schemas, and the skill registry into context, then sends everything to the model.

3.4 · Skill Load

The Model Loads a Skill

The model sees /test in the skill registry. It returns a tool call 4: load_skill("/test"). The engine reads the skill file and returns its content — instructions for running and reporting tests.

3.5 · Bash Call

Shell Execution

The test skill tells the model to run tests. The model calls bash("pytest --cov") 5 and the engine executes the command locally, returning stdout and stderr as the tool result.

This is what separates a chatbot from an agent: the ability to run real commands on your machine.

3.6 · Done

Test Results

The model reads the test output, summarizes the results, and reports back 6: all 42 tests passing with 94% coverage. One prompt, one skill, one bash call.

Skill + Bash = PowerSkills are reusable workflows. Bash gives raw system access. Together they let the agent do real work — not just answer questions.
4.0 · Becoming an Agent

The Agentic Loop

You have already seen every piece: the runtime, the model prompt, tools, instructions, and skills. Now they work together. The persona tells the model: plan your approach, take an action, observe the result, decide if you are done.

The model still handles one prompt and one prediction at a time. After each response, the Execution Engine feeds the result back and calls the model again. The loop is not inside the model — it is the engine re-invoking it.

Same Pieces, New BehaviorNothing was added. The instructions changed. Plan → act → observe → repeat, until the task is complete.
4.1 · Prompt

A Complex Request

You say: “Implement the dashboard.” 1 The request enters the engine with instructions, tool schemas, and skill registry already in context.

4.2 · Plan

Gather Requirements

The model’s first move: call Jira 2 to fetch the ticket specs. The MCP server returns the requirements. With specs in context, the model forms a plan and starts executing.

4.3 · Build

Write the Code

With specs in context, the model starts building 3. It calls writeFile for index.html and app.js. Each call goes to the engine, executes, and returns. The model keeps going.

4.4 · Test

Run Tests — Fail

The model runs npm test 4. Two tests fail. This is where the loop matters: the model does not stop or ask you what to do. It reads the error, decides what to fix, and continues.

4.5 · Fix & Verify

The Loop in Action

The model fixes app.js 5, re-runs the tests, and this time they all pass. Each iteration is still one prompt → one prediction. The runtime feeds the result back and the model decides: “am I done?”

4.6 · Done

Feature Complete

All tests pass. The model reports back 6: dashboard implemented, 3 files created, 8 tests passing. One prompt from you. Multiple rounds of plan → act → observe → decide internally.

Same Model, New BehaviorThe model did not gain new capabilities. The instructions told it to keep going until done. The runtime handled the re-invocation. The loop is infrastructure, not intelligence.
5.0 · Orchestration

Multiple Agents, One Goal

So far, one agent, one loop. But real tasks need coordination: implement a feature, review the code, run the tests. An orchestrator delegates sub-tasks to specialized agents.

Each agent is still one runtime, one loop, one model call at a time. The orchestrator decides who works on what.

5.1 · The Swarm Pattern

Specialized Agents, Coordinated

Each agent gets a different persona from its instruction file: one implements features, another reviews code, a third runs tests. Same model, different instructions, different behavior.

The orchestrator delegates tasks, collects results, and re-plans. Each sub-agent runs its own agentic loop — plan, act, observe, repeat — then reports back.

Same Building BlocksA multi-agent system is just more of what you’ve already seen: runtimes, contexts, models, and tools. The orchestrator is itself an agent — it just delegates instead of executing.
6.0 · The Full Picture

Where Does Everything Run?

Here is the complete picture: multiple agent runtimes, each with its own context and tools, talking to models and connecting to MCP servers. But where does each piece actually live?

6.1 · Your Machine

The Local Perimeter

The agent runtimes, tools, and all execution happen on your machine. Your code, your files, your bash — nothing leaves unless you allow it.

Local ExecutionThe runtime reads your files, runs your tests, and writes your code. The model never sees your filesystem directly — only what the runtime sends in context.
6.2 · Sandboxing

Isolated Execution

Some agents run in a sandbox — a restricted environment with no network access, limited filesystem scope, and controlled permissions. This is governance built into the runtime, not bolted on.

Blast RadiusSandboxing limits what an agent can break. If it goes off-script, the damage stops at the sandbox boundary. Human-in-the-loop approvals gate anything outside.
6.3 · Model Hosting

Three Places for Models

The model API can run in three places: a cloud provider (Claude, GPT, Gemini), a local model on your machine (Ollama, llama.cpp), or a private cloud (AWS Bedrock, Azure OpenAI).

The runtime doesn’t care which — it sends the same API call. What changes is where your prompts go.

Data SovereigntyCloud models mean your prompts leave your perimeter. Local or private cloud keeps everything inside your network. Choose based on your security requirements.
6.4 · Local Tools

MCP on Your Machine

Local MCP servers run alongside your agent. They connect to databases, GitLab, internal APIs — services inside your network. Data stays local; only tool results enter context.

But “local” doesn’t always mean contained. A web-search MCP runs locally but makes Google API calls over the internet. A Puppeteer MCP fetches external URLs. The server is local — the data it touches may not be.

Watch the ArrowsA local MCP can still call external APIs. “Local” describes where the server process runs, not where the data goes. Audit the outbound calls, not just the install location.
6.5 · Cloud Services

Remote MCP Servers

Some MCP servers are cloud-hosted — GitHub, Jira, Slack. Your local agent calls them via API. The connection crosses your perimeter, so permissions and scoping matter.

The MCP protocol standardizes auth and scoping: each server declares what it can do, and the runtime enforces which agents can call which tools.

6.6 · Attack Surface

Where Are the Risks?

1The agent can read .env files, SSH keys, API tokens from your filesystem. Credentials are one tool call away.

2Local MCPs can call external APIs. A “local” web-search MCP sends queries over the internet. Data crosses your perimeter.

3Remote MCP servers hold your API tokens and receive your data. Third-party services see what you send them.

4Cloud models receive every prompt. Where are logs stored? Is your data used for training? Who has access?

Evaluate Before You DeployEvery numbered risk is a governance decision. Sandboxing, human-in-the-loop approvals, audit logs, and local models are your mitigations.
Implementation

Same Blocks, Different Products

Every modern AI coding tool uses the same building blocks. What changes is the model layer, the instruction system, and where execution actually happens: in your editor, on your machine, or in a remote sandbox.

Claude Code Anthropic · CLI

Model: Claude models
Instructions: CLAUDE.md
Runtime: Local terminal agent with subagents and tool permissions.
Key Strength: Strong terminal workflow with explicit memory and delegation primitives.

Codex OpenAI · CLI / Cloud

Model: GPT-5.4
Instructions: AGENTS.md and runtime policy
Runtime: Local CLI or cloud task sandboxes, depending on surface.
Key Strength: Strong coding and agentic execution across tools and computer-use workflows.

Cursor Anysphere · IDE

Model: Multi-model selection
Instructions: .cursor/rules
Runtime: In-editor agent plus optional remote background agents.
Key Strength: Deep IDE integration, codebase indexing, and async agents.

Gemini Code Assist Google · IDE / CLI

Model: Gemini family
Instructions: Workspace context and agent-mode prompts
Runtime: IDE assistant plus Gemini CLI agent loop.
Key Strength: Tight Google ecosystem integration and cited code assistance.

OpenCode Open Source · Terminal / App

Model: Any configured provider API or local model
Instructions: Config, prompts, and selected tools
Runtime: Its own agent wrapper; not just a thin skin over another coding agent CLI.
Key Strength: Provider-agnostic runtime with direct model selection and local execution.

Copilot GitHub · IDE / CLI / Cloud

Model: Multi-model (GPT-5.x, Claude, Gemini) with auto-selection
Instructions: .github/copilot-instructions.md and repo context
Runtime: IDE agent mode, CLI agent, or autonomous cloud agent in sandboxed GitHub Actions.
Key Strength: Platform-native: assign an Issue and get a PR back from a cloud sandbox.
Direct Model Path OpenCode makes the distinction explicit: you connect provider APIs, choose a model, and its local runtime wraps that model with prompts, tools, and execution logic. The agent is the wrapper, not the raw model.
Platform-Native Path Copilot inverts the execution model: assign a GitHub Issue and an autonomous agent spins up in a sandboxed Actions environment, explores the repo, writes code, runs tests, and opens a PR. The IDE and CLI surfaces run locally but can offload work to this cloud agent.
Governance

Before You Deploy

Five questions every team should answer before putting agents into production.

1

Data Sovereignty

Where is the model running, and who stores the logs of my prompt history?

2

Tool Permissions

What is the 'blast radius' if the agent goes rogue? Does it have write access?

3

Auditability

Can I replay every action the agent took in a verifiable log?

4

Approval Gates

Where are the 'human-in-the-loop' checkpoints for high-risk actions (and beware of approval fatigue)?

5

Instruction Drift

How do I detect, correct, and prevent the agent from deviating from its core AGENTS.md scoped role?

Summary

Models are EnginesPredictions based on patterns, not autonomous thought.
Tools are HandsThe bridge from digital thought to physical/system action.
Instructions are PolicyDefining the 'Who' and 'What' of agentic behavior.
Loops are AgencyThe difference between a chatbot and a coworker.
Build with Intent. Evaluate with Rigor.