How AI Agents Actually Work — APIs, Tools, and the Loop That Ties Them Together
This article was generated by AI. The accuracy of the content is not guaranteed, and we accept no responsibility for any damages resulting from use of this article. By continuing to read, you agree to the Terms of Use.
- Target audience: Engineers with programming experience but new to AI agent development
- Prerequisites: Basic knowledge of REST APIs and JSON
- Reading time: 15 minutes
Overview
The term “AI agent” is everywhere. But when you actually sit down to write the code, a surprisingly simple structure emerges.
An AI agent is, at its core, an LLM calling tools in a loop. Anthropic’s official guide defines it plainly: “Agents are typically just LLMs using tools based on environmental feedback in a loop”1. That single sentence captures the entire architecture.
Understanding this architecture requires stacking three layers. First, the Chat Completion API — the basic interface for talking to an LLM. Then Tool Use — the mechanism that lets an LLM reach into the outside world. Finally, the agent loop — a repeating structure where the LLM receives tool results and autonomously decides its next action. The difference between a chatbot and an agent comes down to the presence of this loop.
This article focuses on concepts and mechanisms rather than implementation details. The goal is to understand the “raw shape” of LLM APIs, independent of any particular framework. Anthropic themselves recommend “starting with direct LLM API usage” rather than reaching for a framework first1.
Layer 1: The Chat Completion API — The Basics of Talking to an LLM
The API Is Stateless
The first thing that surprises newcomers to LLM APIs is that the API is stateless. Web interfaces like ChatGPT or Claude make conversations appear continuous, but the API itself has no memory.
You must resend the entire conversation history with every API request. When a web app or CLI tool appears to “remember” the conversation, it’s because the application is storing the history and sending it to the API each time.
1
2
3
4
5
6
7
8
9
10
11
12
13
// Third turn — includes the full conversation so far
{
"model": "claude-sonnet-4-6",
"max_tokens": 1024,
"system": "Reply concisely with code examples.",
"messages": [
{"role": "user", "content": "How do I sort a list in Python?"},
{"role": "assistant", "content": "You can use sorted() or list.sort()..."},
{"role": "user", "content": "How about reverse order?"},
{"role": "assistant", "content": "Pass reverse=True..."},
{"role": "user", "content": "Is it a stable sort?"}
]
}
This follows the same design philosophy as HTTP. Each request is self-contained; the server holds no state.
Message Structure — Roles and Content
API messages consist of two elements: role and content2.
| Role | Purpose | Set by |
|---|---|---|
system | Defines the LLM’s behavior (*) | Developer |
user | Input from the user | End user |
assistant | The LLM’s response | LLM (or developer via prefill) |
* The system role is implemented differently across APIs. In Anthropic’s API, it exists as a separate parameter outside the message array. In OpenAI’s API, it’s included as a role: "system" message within the array. In either case, it’s invisible to end users but has a significant impact on LLM behavior.
Content isn’t limited to plain text. Major LLM APIs can handle images, documents, tool invocations, and tool results as content blocks2. This structure forms the foundation of Tool Use.
The Context Window — The LLM’s Working Memory
There’s an upper limit on how many tokens you can send to the API. This is called the context window. As of 2026, major models support anywhere from 128K to 10M tokens, but effective capacity is roughly 60–70% of the advertised maximum3. Near the upper limit, quality doesn’t degrade gradually — it tends to drop off suddenly.
Context window management is critical for agent development. As conversations grow, you need to summarize older messages or prune unnecessary information. Tool definitions themselves consume tokens, so including unused tools impacts both cost and quality.
Layer 2: Tool Use — Reaching Into the Outside World
The Limits of “Just Generating Text”
With the Chat Completion API alone, an LLM is just a box that takes text in and puts text out. It doesn’t know today’s weather, the contents of your database, or the state of your file system.
Tool Use (also known as Function Calling) breaks through this limitation45. The terminology varies by provider, but the mechanism is nearly identical.
Defining Tools — Teaching the LLM Its Toolkit with JSON Schema
First, you tell the LLM what tools are available by providing JSON Schema definitions4.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"name": "get_weather",
"description": "Get the current weather for a given city",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name (e.g., Tokyo, New York)"
}
},
"required": ["city"]
}
}
A critical point: the LLM doesn’t execute the tool itself. All the LLM does is generate a request saying “please call this tool with these arguments.” Actual execution is the application’s responsibility.
The Request-Response Cycle
Tool Use works through the following steps45:
sequenceDiagram
participant App
participant LLM
participant Tool
App->>LLM: Message + tool definitions
LLM->>App: tool_use block
App->>Tool: Execute tool
Tool->>App: Result
App->>LLM: Send tool_result
LLM->>App: Final response
Step 2 is the heart of this mechanism. Instead of a normal response (stop_reason: "end_turn"), the LLM returns stop_reason: "tool_use" — a special stop reason meaning “I’ve paused text generation and I’m waiting for a tool to be executed.”
The application detects this signal, executes the specified tool, and sends the result back to the API as a tool_result. The LLM then generates its final response informed by the tool’s output.
How the LLM “Chooses” a Tool
Whether the LLM calls a tool depends on the combination of the user’s request and the tool’s description4. If a user asks “What’s the weather in Tokyo?” and the get_weather tool’s description says “Get the current weather for a given city,” the LLM will choose to invoke that tool.
In other words, a tool’s description is part of the prompt. Anthropic notes that “tool design deserves as much attention as prompt engineering” and that when building their SWE-bench agents, they “spent more optimization time on tools than overall prompts”1.
The tool_choice parameter lets you control tool invocation behavior45:
| Setting | Behavior |
|---|---|
auto (default) | LLM decides automatically |
any / required | Must call at least one tool |
| Specific tool | Only the specified tool is called |
none | No tools used |
Layer 3: The Agent Loop — Autonomous Decision-Making on Repeat
The Dividing Line Between Chatbots and Agents
With the first two layers (API + Tool Use), you can build a chatbot that fetches external data to answer user questions. But that alone doesn’t make an agent.
The difference between a chatbot and an agent is the loop1.
- Chatbot: User asks → LLM responds (may use tools) → Done. The user always decides the next action.
- Agent: User assigns a task → LLM plans, executes tools, evaluates results, and decides the next action on its own → Repeats until the task is complete.
The ReAct Pattern — Interleaving Thinking and Acting
The theoretical foundation for the agent loop is the ReAct (Reasoning + Acting) pattern, proposed in 20226.
Before ReAct, there were two approaches, each with a limitation:
- Chain-of-Thought (CoT): Makes the LLM “reason,” but without access to external information, hallucination is a risk.
- Action-only: Calls tools, but without planning — it’s reactive and haphazard.
ReAct interleaves both, compensating for each approach’s weakness6:
flowchart TB
T["Thought<br>Analyze the situation,<br>plan the next action"]
A["Action<br>Call an external tool"]
O["Observation<br>Receive the tool's result"]
Done["Task complete"]
T --> A
A --> O
O -->|Need more info| T
O -->|Goal achieved| Done
Here’s a concrete example. Given the task “Look up the latest version of Python’s requests library and summarize the changes”:
- Thought: “First, let me check the latest version on PyPI.”
- Action: Call
search_pypi(package="requests") - Observation: “Latest version is 2.32.3, released May 2024.”
- Thought: “Got the version. Now let me check the CHANGELOG.”
- Action: Call
fetch_url(url="https://...") - Observation: Receives CHANGELOG content.
- Thought: “I have everything I need. Let me write the summary.”
- Final response: Outputs a summary of the changes.
Reasoning (Thought) guides action planning, and tool results (Observation) ground the reasoning in real information. This synergy led ReAct to achieve a 34% improvement in success rate on interactive decision-making tasks (ALFWorld)6.
The Agent Loop in Code
Translating the ReAct concept into implementation, the agent loop becomes surprisingly simple:
1
2
3
4
5
6
7
8
9
10
messages = [initial_task]
while True:
response = llm.call(messages, tools)
if response.stop_reason == "end_turn":
break # Task complete
if response.stop_reason == "tool_use":
result = execute_tool(response.tool_call)
messages.append(response) # Add LLM response to history
messages.append(tool_result) # Add tool result to history
# Loop back and let the LLM decide again
At each iteration, the LLM makes two judgments1:
- What to do next — which tool to call and with what arguments.
- Whether it’s done — has the task been completed?
The loop terminates on one of two conditions: the LLM returns stop_reason: "end_turn" (natural completion) or a maximum iteration count is reached (runaway prevention).
Workflows vs. Agents
Anthropic distinguishes two types of agentic systems1:
| Workflow | Agent | |
|---|---|---|
| Who’s in control | Code (developer) | LLM |
| Execution path | Predefined | Dynamically determined |
| Number of steps | Fixed | Variable |
| Best for | Predictable, well-defined tasks | Tasks requiring flexibility |
| Cost & risk | Lower | Higher (errors can compound) |
A workflow uses the LLM as a component in a fixed procedure. An agent lets the LLM decide the procedure itself. For example, “generate code → run tests → evaluate results” as a fixed pipeline is a workflow. “Read a GitHub Issue, identify the relevant files, apply a fix, and make the tests pass” is an agent.
Anthropic advises clearly that for many applications, optimizing simple LLM calls is sufficient, and agents should only be introduced when the complexity is justified1.
The Hard Part Is Outside the Loop
The agent’s structure, as we’ve seen, is remarkably simple. A while loop, an API call, tool execution — it’s a few dozen lines of code. In fact, if you ask an AI coding tool like Claude Code to “scaffold an agent,” you’ll have working code in minutes. The implementation of the structure is no longer the bottleneck.
But making that agent actually useful is a different story. What’s hard isn’t the loop itself — it’s the quality of the inputs you feed into the loop: the system prompt, the tool descriptions, and the context design.
System prompt tuning is the biggest challenge. The system prompt tells the agent who it is, what it should do, and what it must not do — but how you write these instructions dramatically changes the agent’s behavior. Too long, and critical instructions get buried. Too short, and the agent takes unexpected actions. Ambiguous wording leaves interpretation up to the LLM.
Tool descriptions are equally critical. As noted earlier, Anthropic “spent more optimization time on tools than overall prompts”1. If a description is poorly written, the LLM won’t select the right tool. If parameter descriptions are vague, it generates unintended arguments. Tool design is prompt engineering.
Context management is another hard problem. The conversation history grows with every loop iteration. Exhausting the context window causes quality to collapse, but there’s no definitive answer to what to keep and what to discard.
In short, the difficulty of agent development splits into two layers:
| Infrastructure layer | Design layer | |
|---|---|---|
| What you build | API client, loop structure, tool execution | System prompt, tool definitions, context strategy |
| Difficulty | Low (established patterns) | High (requires iteration) |
| Delegatable to AI? | Yes | Requires human judgment |
| Source of differentiation | No | Yes |
The infrastructure layer can be delegated to code generation tools. The differentiation lies in the design layer — “what to tell this agent, and how.” In Claude Code, for instance, you can change an agent’s behavior simply by writing behavioral rules and decision criteria in natural language in a skill file (SKILL.md). The same loop structure can produce vastly different quality agents depending on how the instructions are written.
Summary
The architecture of an AI agent can be understood as three stacked layers:
- Chat Completion API: Stateless request-response. The application manages conversation history.
- Tool Use: Define tools with JSON Schema, the LLM generates “invocation requests,” and the application executes them.
- Agent loop: Feed tool results back to the LLM and let it autonomously decide the next action, repeating until done.
An agent’s structure isn’t “complex technology” — it’s a simple loop. Writing the code can be done in minutes with an AI coding tool. But the “language design” that makes that loop work intelligently — system prompts, tool definitions, context management — is the real competitive battleground. Understanding the mechanism is the starting point; the real work lies beyond it.
Related Articles
- Don’t Build AI Agents — Build Services They Choose - Business strategy for the agent era
- The Generator-Verifier Pattern: Why “Find It” Works Better Than “Don’t Do It” for LLMs - A deep dive into agent design patterns
- The Expert Who Doesn’t Write Prompts — Meta-Prompting and the Evolution to Orchestrator - The next stage of prompt design
- The Knowledge-Skill Boundary of LLMs — What to “Teach” and What to “Delegate” - Understanding LLM capabilities and limits
References
References are listed in order corresponding to the citation numbers in the text.
Building Effective Agents - Anthropic, Erik Schluntz & Barry Zhang (2024). [Reliability: High] ↩︎ ↩︎2 ↩︎3 ↩︎4 ↩︎5 ↩︎6 ↩︎7 ↩︎8
Messages API Reference - Anthropic (2024-2026). [Reliability: High] ↩︎ ↩︎2
Context Length Comparison: Leading AI Models in 2026 - Elvex (2026). [Reliability: Medium-High] ↩︎
Tool use with Claude - Overview - Anthropic (2024-2026). [Reliability: High] ↩︎ ↩︎2 ↩︎3 ↩︎4 ↩︎5
Function calling - OpenAI API - OpenAI (2024-2026). [Reliability: High] ↩︎ ↩︎2 ↩︎3
ReAct: Synergizing Reasoning and Acting in Language Models - Yao et al., ICLR 2023 (2022). Peer-reviewed conference paper. [Reliability: High] ↩︎ ↩︎2 ↩︎3