How AI Agents Actually Work — APIs, Tools, and the Loop That Ties Them Together

Posted Apr 8, 2026

12 min read

AI-Generated Content

This article was generated by AI. The accuracy of the content is not guaranteed, and we accept no responsibility for any damages resulting from use of this article. By continuing to read, you agree to the Terms of Use.

Target audience: Engineers with programming experience but new to AI agent development
Prerequisites: Basic knowledge of REST APIs and JSON
Reading time: 15 minutes

Overview

The term “AI agent” is everywhere. But when you actually sit down to write the code, a surprisingly simple structure emerges.

An AI agent is, at its core, an LLM calling tools in a loop. Anthropic’s official guide defines it plainly: “Agents are typically just LLMs using tools based on environmental feedback in a loop”¹. That single sentence captures the entire architecture.

Understanding this architecture requires stacking three layers. First, the Chat Completion API — the basic interface for talking to an LLM. Then Tool Use — the mechanism that lets an LLM reach into the outside world. Finally, the agent loop — a repeating structure where the LLM receives tool results and autonomously decides its next action. The difference between a chatbot and an agent comes down to the presence of this loop.

This article focuses on concepts and mechanisms rather than implementation details. The goal is to understand the “raw shape” of LLM APIs, independent of any particular framework. Anthropic themselves recommend “starting with direct LLM API usage” rather than reaching for a framework first¹.

Layer 1: The Chat Completion API — The Basics of Talking to an LLM

The API Is Stateless

The first thing that surprises newcomers to LLM APIs is that the API is stateless. Web interfaces like ChatGPT or Claude make conversations appear continuous, but the API itself has no memory.

You must resend the entire conversation history with every API request. When a web app or CLI tool appears to “remember” the conversation, it’s because the application is storing the history and sending it to the API each time.

  
// Third turn — includes the full conversation so far
{
  "model": "claude-sonnet-4-6",
  "max_tokens": 1024,
  "system": "Reply concisely with code examples.",
  "messages": [
    {"role": "user", "content": "How do I sort a list in Python?"},
    {"role": "assistant", "content": "You can use sorted() or list.sort()..."},
    {"role": "user", "content": "How about reverse order?"},
    {"role": "assistant", "content": "Pass reverse=True..."},
    {"role": "user", "content": "Is it a stable sort?"}
  ]
}

This follows the same design philosophy as HTTP. Each request is self-contained; the server holds no state.

Message Structure — Roles and Content

API messages consist of two elements: role and content².

Role	Purpose	Set by
`system`	Defines the LLM’s behavior (*)	Developer
`user`	Input from the user	End user
`assistant`	The LLM’s response	LLM (or developer via prefill)

* The system role is implemented differently across APIs. In Anthropic’s API, it exists as a separate parameter outside the message array. In OpenAI’s API, it’s included as a role: "system" message within the array. In either case, it’s invisible to end users but has a significant impact on LLM behavior.

Content isn’t limited to plain text. Major LLM APIs can handle images, documents, tool invocations, and tool results as content blocks². This structure forms the foundation of Tool Use.

The Context Window — The LLM’s Working Memory

There’s an upper limit on how many tokens you can send to the API. This is called the context window. As of 2026, major models support anywhere from 128K to 10M tokens, but effective capacity is roughly 60–70% of the advertised maximum³. Near the upper limit, quality doesn’t degrade gradually — it tends to drop off suddenly.

Context window management is critical for agent development. As conversations grow, you need to summarize older messages or prune unnecessary information. Tool definitions themselves consume tokens, so including unused tools impacts both cost and quality.

Layer 2: Tool Use — Reaching Into the Outside World

The Limits of “Just Generating Text”

With the Chat Completion API alone, an LLM is just a box that takes text in and puts text out. It doesn’t know today’s weather, the contents of your database, or the state of your file system.

Tool Use (also known as Function Calling) breaks through this limitation⁴⁵. The terminology varies by provider, but the mechanism is nearly identical.

Defining Tools — Teaching the LLM Its Toolkit with JSON Schema

First, you tell the LLM what tools are available by providing JSON Schema definitions⁴.

  
{
  "name": "get_weather",
  "description": "Get the current weather for a given city",
  "input_schema": {
    "type": "object",
    "properties": {
      "city": {
        "type": "string",
        "description": "City name (e.g., Tokyo, New York)"
      }
    },
    "required": ["city"]
  }
}

A critical point: the LLM doesn’t execute the tool itself. All the LLM does is generate a request saying “please call this tool with these arguments.” Actual execution is the application’s responsibility.

The Request-Response Cycle

Tool Use works through the following steps⁴⁵:

sequenceDiagram
    participant App
    participant LLM
    participant Tool
    App->>LLM: Message + tool definitions
    LLM->>App: tool_use block
    App->>Tool: Execute tool
    Tool->>App: Result
    App->>LLM: Send tool_result
    LLM->>App: Final response

Step 2 is the heart of this mechanism. Instead of a normal response (stop_reason: "end_turn"), the LLM returns stop_reason: "tool_use" — a special stop reason meaning “I’ve paused text generation and I’m waiting for a tool to be executed.”

The application detects this signal, executes the specified tool, and sends the result back to the API as a tool_result. The LLM then generates its final response informed by the tool’s output.

How the LLM “Chooses” a Tool

Whether the LLM calls a tool depends on the combination of the user’s request and the tool’s description⁴. If a user asks “What’s the weather in Tokyo?” and the get_weather tool’s description says “Get the current weather for a given city,” the LLM will choose to invoke that tool.

In other words, a tool’s description is part of the prompt. Anthropic notes that “tool design deserves as much attention as prompt engineering” and that when building their SWE-bench agents, they “spent more optimization time on tools than overall prompts”¹.

The tool_choice parameter lets you control tool invocation behavior⁴⁵:

Setting	Behavior
`auto` (default)	LLM decides automatically
`any` / `required`	Must call at least one tool
Specific tool	Only the specified tool is called
`none`	No tools used

Layer 3: The Agent Loop — Autonomous Decision-Making on Repeat

The Dividing Line Between Chatbots and Agents

With the first two layers (API + Tool Use), you can build a chatbot that fetches external data to answer user questions. But that alone doesn’t make an agent.

The difference between a chatbot and an agent is the loop¹.

Chatbot: User asks → LLM responds (may use tools) → Done. The user always decides the next action.
Agent: User assigns a task → LLM plans, executes tools, evaluates results, and decides the next action on its own → Repeats until the task is complete.

The ReAct Pattern — Interleaving Thinking and Acting

The theoretical foundation for the agent loop is the ReAct (Reasoning + Acting) pattern, proposed in 2022⁶.

Before ReAct, there were two approaches, each with a limitation:

Chain-of-Thought (CoT): Makes the LLM “reason,” but without access to external information, hallucination is a risk.
Action-only: Calls tools, but without planning — it’s reactive and haphazard.

ReAct interleaves both, compensating for each approach’s weakness⁶:

flowchart TB
    T["Thought<br>Analyze the situation,<br>plan the next action"]
    A["Action<br>Call an external tool"]
    O["Observation<br>Receive the tool's result"]
    Done["Task complete"]

    T --> A
    A --> O
    O -->|Need more info| T
    O -->|Goal achieved| Done

Here’s a concrete example. Given the task “Look up the latest version of Python’s requests library and summarize the changes”:

Thought: “First, let me check the latest version on PyPI.”
Action: Call search_pypi(package="requests")
Observation: “Latest version is 2.32.3, released May 2024.”
Thought: “Got the version. Now let me check the CHANGELOG.”
Action: Call fetch_url(url="https://...")
Observation: Receives CHANGELOG content.
Thought: “I have everything I need. Let me write the summary.”
Final response: Outputs a summary of the changes.

Reasoning (Thought) guides action planning, and tool results (Observation) ground the reasoning in real information. This synergy led ReAct to achieve a 34% improvement in success rate on interactive decision-making tasks (ALFWorld)⁶.

The Agent Loop in Code

Translating the ReAct concept into implementation, the agent loop becomes surprisingly simple:

messages = [initial_task]
while True:
    response = llm.call(messages, tools)
    if response.stop_reason == "end_turn":
        break  # Task complete
    if response.stop_reason == "tool_use":
        result = execute_tool(response.tool_call)
        messages.append(response)      # Add LLM response to history
        messages.append(tool_result)    # Add tool result to history
        # Loop back and let the LLM decide again

At each iteration, the LLM makes two judgments¹:

What to do next — which tool to call and with what arguments.
Whether it’s done — has the task been completed?

The loop terminates on one of two conditions: the LLM returns stop_reason: "end_turn" (natural completion) or a maximum iteration count is reached (runaway prevention).

Workflows vs. Agents

Anthropic distinguishes two types of agentic systems¹:

	Workflow	Agent
Who’s in control	Code (developer)	LLM
Execution path	Predefined	Dynamically determined
Number of steps	Fixed	Variable
Best for	Predictable, well-defined tasks	Tasks requiring flexibility
Cost & risk	Lower	Higher (errors can compound)

A workflow uses the LLM as a component in a fixed procedure. An agent lets the LLM decide the procedure itself. For example, “generate code → run tests → evaluate results” as a fixed pipeline is a workflow. “Read a GitHub Issue, identify the relevant files, apply a fix, and make the tests pass” is an agent.

Anthropic advises clearly that for many applications, optimizing simple LLM calls is sufficient, and agents should only be introduced when the complexity is justified¹.

The Hard Part Is Outside the Loop

The agent’s structure, as we’ve seen, is remarkably simple. A while loop, an API call, tool execution — it’s a few dozen lines of code. In fact, if you ask an AI coding tool like Claude Code to “scaffold an agent,” you’ll have working code in minutes. The implementation of the structure is no longer the bottleneck.

But making that agent actually useful is a different story. What’s hard isn’t the loop itself — it’s the quality of the inputs you feed into the loop: the system prompt, the tool descriptions, and the context design.

System prompt tuning is the biggest challenge. The system prompt tells the agent who it is, what it should do, and what it must not do — but how you write these instructions dramatically changes the agent’s behavior. Too long, and critical instructions get buried. Too short, and the agent takes unexpected actions. Ambiguous wording leaves interpretation up to the LLM.

Tool descriptions are equally critical. As noted earlier, Anthropic “spent more optimization time on tools than overall prompts”¹. If a description is poorly written, the LLM won’t select the right tool. If parameter descriptions are vague, it generates unintended arguments. Tool design is prompt engineering.

Context management is another hard problem. The conversation history grows with every loop iteration. Exhausting the context window causes quality to collapse, but there’s no definitive answer to what to keep and what to discard.

In short, the difficulty of agent development splits into two layers:

	Infrastructure layer	Design layer
What you build	API client, loop structure, tool execution	System prompt, tool definitions, context strategy
Difficulty	Low (established patterns)	High (requires iteration)
Delegatable to AI?	Yes	Requires human judgment
Source of differentiation	No	Yes

The infrastructure layer can be delegated to code generation tools. The differentiation lies in the design layer — “what to tell this agent, and how.” In Claude Code, for instance, you can change an agent’s behavior simply by writing behavioral rules and decision criteria in natural language in a skill file (SKILL.md). The same loop structure can produce vastly different quality agents depending on how the instructions are written.

Summary

The architecture of an AI agent can be understood as three stacked layers:

Chat Completion API: Stateless request-response. The application manages conversation history.
Tool Use: Define tools with JSON Schema, the LLM generates “invocation requests,” and the application executes them.
Agent loop: Feed tool results back to the LLM and let it autonomously decide the next action, repeating until done.

An agent’s structure isn’t “complex technology” — it’s a simple loop. Writing the code can be done in minutes with an AI coding tool. But the “language design” that makes that loop work intelligently — system prompts, tool definitions, context management — is the real competitive battleground. Understanding the mechanism is the starting point; the real work lies beyond it.

Don’t Build AI Agents — Build Services They Choose - Business strategy for the agent era
The Generator-Verifier Pattern: Why “Find It” Works Better Than “Don’t Do It” for LLMs - A deep dive into agent design patterns
The Expert Who Doesn’t Write Prompts — Meta-Prompting and the Evolution to Orchestrator - The next stage of prompt design
The Knowledge-Skill Boundary of LLMs — What to “Teach” and What to “Delegate” - Understanding LLM capabilities and limits

References

References are listed in order corresponding to the citation numbers in the text.

Building Effective Agents - Anthropic, Erik Schluntz & Barry Zhang (2024). [Reliability: High] ↩︎ ↩︎² ↩︎³ ↩︎⁴ ↩︎⁵ ↩︎⁶ ↩︎⁷ ↩︎⁸
Messages API Reference - Anthropic (2024-2026). [Reliability: High] ↩︎ ↩︎²
Context Length Comparison: Leading AI Models in 2026 - Elvex (2026). [Reliability: Medium-High] ↩︎
Tool use with Claude - Overview - Anthropic (2024-2026). [Reliability: High] ↩︎ ↩︎² ↩︎³ ↩︎⁴ ↩︎⁵
Function calling - OpenAI API - OpenAI (2024-2026). [Reliability: High] ↩︎ ↩︎² ↩︎³
ReAct: Synergizing Reasoning and Acting in Language Models - Yao et al., ICLR 2023 (2022). Peer-reviewed conference paper. [Reliability: High] ↩︎ ↩︎² ↩︎³

AI・Technology

AI AI-Agents LLM API Tool-Use Function-Calling ReAct

This post is licensed under CC BY 4.0 by the author.