The AI Agent Stack

1.0 · The Core

The Model (LLM)

A Large Language Model is an external API — a remote inference service. On its own it is stateless: no memory, no file access, no ability to act.

The model sits on the right. Your code calls it. Everything else you see in this guide is the wrapper around that call.

1.1 · The Prompt

You Send a Prompt

You send text to the model API 1. The model receives your question as a sequence of tokens — nothing more.

1.2 · The Response

It Predicts a Reply

The model predicts the most likely next tokens 2. Text in, text out. No tools, no memory, no side effects.

StatelessEach API call is independent. The model knows facts, reasoning, and code patterns from training — but it cannot look things up or take actions.

2.0 · The Runtime

The Agent Runtime

Between you and the model sits the agent runtime — a local process that assembles context, calls the model API, and interprets what comes back.

At its core is the Execution Engine: the component that receives your request and coordinates everything that follows.

2.1 · The Request

Your Question Enters the Runtime

You ask: “What is the weather in Boston?” 1. This time the question goes to the Execution Engine — not directly to the model.

2.2 · Context

The Model Prompt

The Context Assembly layer 2 gathers system instructions, conversation history, and configuration into a single model prompt — and sends it to the model 2.

The model prompt is what the model actually sees. Context assembly is the process that builds it. Change what goes in, change the behavior.

Notice the History box inside the engine — it holds every user message and model response from this session. Each new prompt includes the full conversation so far, giving the model continuity across turns.

FramingThe same base model becomes an assistant, a reviewer, or a planner — depending on what the runtime puts in the model prompt.

2.3 · Tools

MCP · Universal Plugins

MCP servers expose tool schemas — the runtime injects these into the context assembly so the model knows what tools are available.

This is how the model discovers it can check the weather: the tool signature is right there in the prompt.

How models discover toolsTool schemas are text in the context. The model reads them and knows what it can ask for. MCP standardizes the format so connectors are portable across products.

2.4 · Tool Call

The Model Asks for Help

The model sees the weather tool in its context. Instead of guessing, it returns a structured tool call 4: weather("Boston").

This is not a function the model runs — it is a request sent back to the Execution Engine.

2.5 · MCP Call

The Engine Calls MCP

The Execution Engine routes the tool call to the right MCP server 5. The server executes the request and returns real data.

2.6 · Result

Data Back to Model

The tool result 6 is fed back into the context and sent to the model. The model now has facts it could not have known from training alone.

2.7 · The Answer

Grounded Response

With real data in context, the model produces a factual answer 7 and the runtime delivers it to you.

Execution BoundaryThe model decides what to do. The engine actually does it — locally, on your machine — then feeds the result back. This is where a chat UI becomes an agent.

3.0 · Instructions

The Instruction Layer

To the left of the runtime sits the instruction layer — project-level files that define identity, rules, and behavior. These are injected into the context assembly on every call.

The engine now also shows Memory — persistent storage that survives across sessions. While History resets when you start a new conversation, Memory lets the agent recall user preferences, project context, and past decisions indefinitely. The Context Assembly layer pulls in relevant memories and includes them in the Model Prompt, so the model can act on information from previous sessions.

3.1 · Persona

AGENTS.md

Instruction files 1 get loaded into context assembly. Claude Code uses CLAUDE.md. OpenAI Codex uses AGENTS.md. Cursor uses .cursor/rules.

These files define role, standards, workflow expectations, and guardrails. Swapping the instruction layer turns the same model into a security reviewer, staff engineer, or product writer.

Your Rules, Your FilesInstruction files live on your machine, not in the model. The runtime injects them into every prompt.

3.2 · Skills

Reusable Workflows

A skill registry 2 lists available workflows in the context — names and descriptions the model can read. When relevant, the model makes a tool call to load the full skill file into context.

On-Demand LoadingThe registry is text in the prompt. Loading a skill is a tool call — the content comes back as a tool result, just like any other tool.

3.3 · New Prompt

A Real Request

You ask: “Run my tests” 3. The runtime assembles instructions, tool schemas, and the skill registry into context, then sends everything to the model.

3.4 · Skill Load

The Model Loads a Skill

The model sees /test in the skill registry. It returns a tool call 4: load_skill("/test"). The engine reads the skill file and returns its content — instructions for running and reporting tests.

3.5 · Bash Call

Shell Execution

The test skill tells the model to run tests. The model calls bash("pytest --cov") 5 and the engine executes the command locally, returning stdout and stderr as the tool result.

This is what separates a chatbot from an agent: the ability to run real commands on your machine.

3.6 · Done

Test Results

The model reads the test output, summarizes the results, and reports back 6: all 42 tests passing with 94% coverage. One prompt, one skill, one bash call.

Skill + Bash = PowerSkills are reusable workflows. Bash gives raw system access. Together they let the agent do real work — not just answer questions.

4.0 · Becoming an Agent

The Agentic Loop

You have already seen every piece: the runtime, the model prompt, tools, instructions, and skills. Now they work together. The persona tells the model: plan your approach, take an action, observe the result, decide if you are done.

The model still handles one prompt and one prediction at a time. After each response, the Execution Engine feeds the result back and calls the model again. The loop is not inside the model — it is the engine re-invoking it.

Same Pieces, New BehaviorNothing was added. The instructions changed. Plan → act → observe → repeat, until the task is complete.

4.1 · Prompt

A Complex Request

You say: “Implement the dashboard.” 1 The request enters the engine with instructions, tool schemas, and skill registry already in context.

4.2 · Plan

Gather Requirements

The model’s first move: call Jira 2 to fetch the ticket specs. The MCP server returns the requirements. With specs in context, the model forms a plan and starts executing.

4.3 · Build

Write the Code

With specs in context, the model starts building 3. It calls writeFile for index.html and app.js. Each call goes to the engine, executes, and returns. The model keeps going.

4.4 · Test

Run Tests — Fail

The model runs npm test 4. Two tests fail. This is where the loop matters: the model does not stop or ask you what to do. It reads the error, decides what to fix, and continues.

4.5 · Fix & Verify

The Loop in Action

The model fixes app.js 5, re-runs the tests, and this time they all pass. Each iteration is still one prompt → one prediction. The runtime feeds the result back and the model decides: “am I done?”

4.6 · Done

Feature Complete

All tests pass. The model reports back 6: dashboard implemented, 3 files created, 8 tests passing. One prompt from you. Multiple rounds of plan → act → observe → decide internally.

Same Model, New BehaviorThe model did not gain new capabilities. The instructions told it to keep going until done. The runtime handled the re-invocation. The loop is infrastructure, not intelligence.

The Model (LLM)

You Send a Prompt

It Predicts a Reply

The Agent Runtime

Your Question Enters the Runtime

The Model Prompt

MCP · Universal Plugins

The Model Asks for Help

The Engine Calls MCP

Data Back to Model

Grounded Response

The Instruction Layer

AGENTS.md

Reusable Workflows

A Real Request

The Model Loads a Skill

Shell Execution

Test Results

The Agentic Loop

A Complex Request

Gather Requirements

Write the Code

Run Tests — Fail

The Loop in Action

Feature Complete

Multiple Agents, One Goal

Specialized Agents, Coordinated

Where Does Everything Run?

The Local Perimeter

Isolated Execution

Three Places for Models

MCP on Your Machine

Remote MCP Servers

Where Are the Risks?

Same Blocks, Different Products

Claude Code Anthropic · CLI

Codex OpenAI · CLI / Cloud

Cursor Anysphere · IDE

Gemini Code Assist Google · IDE / CLI

OpenCode Open Source · Terminal / App

Copilot GitHub · IDE / CLI / Cloud

Before You Deploy

Data Sovereignty

Tool Permissions

Auditability

Approval Gates

Instruction Drift

Summary