You are viewing a preview of this lesson. Sign in to start learning
Back to Agentic AI as a Part of Software Development

Protocols, Tools & Skills

Standardize agent integrations with MCP and A2A, design effective tools, and use progressive skill loading.

Why Protocols, Tools & Skills Are the Backbone of Agentic AI

Imagine you just inherited a codebase. Not a clean one — a sprawling, tangled AI script that calls APIs directly, hardcodes prompts inline, and has the agent's "reasoning" fused together with its "doing" in one enormous Python file. Sound familiar? If you've spent any time building AI-powered features, you've probably felt that creeping dread as the system grows: one more integration, one more capability, and the whole thing threatens to collapse under its own weight. Grab the free flashcards at the end of each section to lock in the vocabulary as you go — they'll make the deeper dives feel effortless.

This lesson is about escaping that trap. It's about understanding how modern agentic AI systems are built not as monolithic scripts, but as composable, protocol-driven architectures — systems where agents can communicate reliably, discover and invoke tools without tight coupling, and load new skills on demand. These aren't just engineering niceties. They're the structural foundations that determine whether your agentic system scales gracefully or buckles the moment requirements change.

The Monolith Problem: When AI Scripts Outgrow Themselves

To understand why protocols, tools, and skills matter so deeply, it helps to understand what life looked like before them — and why so many early AI integrations still look that way today.

In the early days of "agent-ish" code, most developers wrote something like this:

## Early-style monolithic agent script
import openai
import requests

def run_agent(user_query: str) -> str:
    # Step 1: Ask the model what to do
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": user_query}
        ]
    )
    answer = response.choices[0].message.content

    # Step 2: If the model says to search, just... do it inline
    if "search" in answer.lower():
        search_result = requests.get(
            f"https://api.search.example.com?q={user_query}"
        ).json()
        # Mash the result back into a new prompt and call again
        answer = openai.chat.completions.create(
            model="gpt-4",
            messages=[
                {"role": "user", "content": f"Results: {search_result}. Now answer: {user_query}"}
            ]
        ).choices[0].message.content

    return answer

This code works — until it doesn't. Add a second tool (say, a calculator or a database lookup) and you've doubled the complexity. Add a third agent that needs to collaborate with this one, and you're now writing custom serialization logic by hand. Change the search API and you're hunting through prompt strings. The logic for what the agent can do is fused with how it decides to do it, which is fused with how it communicates results. Everything is coupled to everything else.

This is the monolithic AI script pattern, and it's the architectural equivalent of building a house where the plumbing runs through the load-bearing walls.

💡 Real-World Example: Early LangChain applications often fell into this trap — chains that were powerful but brittle, where swapping one component required understanding the entire chain's internal state. The community learned this lesson the hard way, and it drove much of the architectural evolution toward more modular, protocol-first designs.

The Composability Revolution: Protocols as the New Foundation

The shift happening in agentic AI right now mirrors a transformation the software industry has lived through before: the move from tightly coupled monoliths to interoperable, contract-driven systems. Think of how REST and HTTP standardized web service communication, or how Docker standardized application packaging. In each case, a protocol — a shared set of rules about how things talk to each other — unlocked an explosion of composable tooling.

The same dynamic is playing out in agentic AI. Standards like MCP (Model Context Protocol) and A2A (Agent-to-Agent protocol) are emerging precisely because developers kept solving the same integration problem over and over again. Instead of every agent framework defining its own way to expose tools, or every multi-agent system inventing its own message-passing format, these protocols establish shared contracts. An agent that speaks MCP can connect to any MCP-compliant tool server. An agent built on A2A can collaborate with any A2A-compatible peer — regardless of which framework built it.

🎯 Key Principle: A protocol doesn't tell an agent what to do — it defines how agents and tools communicate so that any compliant component can interoperate with any other. This is the difference between a language and a speaker.

The Three-Layer Mental Model

Before diving into the specifics of MCP, A2A, tool design, and skill loading in the child lessons, it's worth establishing a mental model that will make all of those concepts click into place. Modern agentic architectures can be understood through three distinct layers, each answering a different fundamental question:

┌─────────────────────────────────────────────────────────┐
│                    AGENT SYSTEM                         │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │         LAYER 3: SKILLS                         │   │
│  │   What does the agent know HOW to invoke?       │   │
│  │   (Loaded capabilities, composed workflows)     │   │
│  └──────────────────────┬──────────────────────────┘   │
│                         │ uses                         │
│  ┌──────────────────────▼──────────────────────────┐   │
│  │         LAYER 2: TOOLS                          │   │
│  │   What can the agent actually DO?               │   │
│  │   (Functions, APIs, services the agent calls)   │   │
│  └──────────────────────┬──────────────────────────┘   │
│                         │ communicated via              │
│  ┌──────────────────────▼──────────────────────────┐   │
│  │         LAYER 1: PROTOCOLS                      │   │
│  │   How do agents and tools TALK to each other?   │   │
│  │   (Message schemas, transport, auth contracts)  │   │
│  └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Layer 1 — Protocols define how agents talk. This layer handles the mechanics of communication: what a message looks like, how capabilities are advertised, how errors are signaled, how sessions are established and torn down. MCP and A2A live here. A well-designed protocol layer means that the layers above it don't need to care about transport details — just like a web developer doesn't think about TCP packet assembly when making a fetch request.

Layer 2 — Tools define what agents do. A tool is a discrete, callable capability: search the web, run a SQL query, send an email, call a payment API, parse a PDF. Tools are the agent's hands — the mechanisms through which it affects the world outside its own reasoning loop. A well-designed tool is narrow, composable, and defined with enough schema information that an agent can discover and invoke it without human-written glue code.

Layer 3 — Skills define what agents know how to invoke. Skills are higher-order than individual tools — they represent learned or configured patterns of tool use. A skill might encode: "to answer a research question, first search three sources, then summarize, then cross-check for contradictions." Skills can be loaded progressively, which matters enormously for performance and cost: you don't want an agent loading the full suite of financial analysis skills when the user just asked for a weather update.

🧠 Mnemonic: Think P-T-S: Protocols are the pipes, Tools are the taps, Skills are the plumber who knows which tap to turn.

Why Standardization Is the Unlock

Here's a question worth sitting with: why does it matter whether agent integrations are standardized rather than just documented?

Documentation tells a human developer how to integrate something. Standardization tells a machine — another agent, a framework, an orchestrator — how to integrate something at runtime, without a human in the loop. That distinction is the entire game.

Consider what happens when tool discovery is standardized. An agent can query a tool registry and get back a structured description of available tools, their input/output schemas, their authentication requirements, and their usage constraints. The agent doesn't need a human to pre-configure which tools it can use — it can discover them dynamically. This is capability discovery, and it's only possible when there's a shared protocol for how tools advertise themselves.

## Example: Protocol-driven tool discovery (MCP-style concept)
## An agent dynamically discovers available tools at runtime

import asyncio
from typing import Any

async def discover_and_invoke_tool(
    mcp_server_url: str,
    task_description: str
) -> Any:
    """
    Demonstrates the protocol-driven pattern:
    1. Connect to an MCP server
    2. Discover what tools are available
    3. Let the agent pick the right tool based on the task
    """
    # Step 1: Discover tools via the protocol (not hardcoded!)
    available_tools = await mcp_client.list_tools(server_url=mcp_server_url)
    
    # Step 2: Tools come back as structured schema objects
    # Each tool describes itself: name, description, input schema
    tool_registry = {
        tool.name: tool for tool in available_tools
    }
    
    print(f"Discovered {len(tool_registry)} tools: {list(tool_registry.keys())}")
    # Output might be: ['web_search', 'calculator', 'send_email', 'read_file']

    # Step 3: The agent decides which tool fits the task
    # (In a real system, the LLM makes this decision using tool schemas)
    selected_tool = agent_llm.select_tool(
        task=task_description,
        available_tools=list(tool_registry.values())
    )
    
    # Step 4: Invoke via protocol — same call pattern regardless of which tool
    result = await mcp_client.invoke_tool(
        server_url=mcp_server_url,
        tool_name=selected_tool.name,
        arguments=selected_tool.suggested_args
    )
    
    return result

## No hardcoded tool integrations. No custom glue code per tool.
## Any MCP-compliant tool server plugs in automatically.

Notice what's absent from this code: there's no if tool_name == 'search': call_search_api() conditional logic. There's no manually maintained list of tools. The protocol handles discovery; the schema handles invocation. Add a new tool server, and the agent finds it automatically on the next connection.

💡 Pro Tip: This is exactly why the industry is converging on MCP and A2A rather than continuing to build proprietary integrations. Each custom integration is a one-off investment. Each protocol-compliant integration is an investment in an ecosystem — where every new tool server works with every protocol-compliant agent.

🤔 Did you know? Anthropic's Model Context Protocol (MCP), open-sourced in late 2024, was explicitly designed so that tool servers don't need to know anything about the agent using them, and agents don't need to know the implementation details of tool servers. The protocol is the only shared contract between them.

Tight Coupling vs. Loose Coupling: A Concrete Contrast

Let's make the architectural difference concrete. Here are two versions of the same multi-agent collaboration — one tightly coupled, one protocol-driven:

## ❌ TIGHT COUPLING: Agent A knows too much about Agent B
class ResearchAgent:
    def __init__(self):
        # ResearchAgent imports and instantiates SummaryAgent directly
        # Changing SummaryAgent's interface breaks ResearchAgent
        from summary_agent import SummaryAgent  
        self.summarizer = SummaryAgent(model="gpt-4", max_tokens=500)
    
    def research_and_summarize(self, topic: str) -> str:
        raw_data = self._search(topic)
        # Direct method call — coupled to SummaryAgent's exact API
        return self.summarizer.summarize(raw_data, style="bullet_points")

## ✅ LOOSE COUPLING: Agent A only knows the protocol contract
class ResearchAgent:
    def __init__(self, agent_registry_url: str):
        # ResearchAgent discovers collaborating agents via protocol
        self.registry = AgentRegistry(agent_registry_url)
    
    async def research_and_summarize(self, topic: str) -> str:
        raw_data = await self._search(topic)
        
        # Find any agent that advertises the 'summarization' capability
        # via the A2A protocol — could be any implementation
        summarizer = await self.registry.find_agent(
            capability="summarization",
            constraints={"style_support": ["bullet_points", "prose"]}
        )
        
        # Invoke via standardized A2A message format
        result = await summarizer.send_task({
            "action": "summarize",
            "content": raw_data,
            "style": "bullet_points"
        })
        return result.output

Wrong thinking: "The tight-coupled version is simpler — why add the abstraction?"

Correct thinking: The tight-coupled version is simpler today, for this use case, with these two specific agents. The protocol-driven version is simpler at scale — when you have dozens of agents, when SummaryAgent gets replaced with a better model, when you want to test ResearchAgent in isolation, or when a new summarization service becomes available.

How This Lesson Prepares You for What's Ahead

This section has laid out the why — the architectural reasoning that motivates everything else in this lesson and in the specialized child lessons that follow. Let's be explicit about what that roadmap looks like:

📋 Quick Reference Card: Lesson Roadmap

📚 Section 🎯 Core Question 🔧 Key Concepts
🔗 Agent Protocols What are the rules agents follow to communicate? Message schemas, contracts, MCP & A2A overview
🛠️ Tools & Skills How do agents act on the world? Tool definitions, skill loading, capability scopes
⚙️ Wiring It Together How do the three layers combine in practice? Pipeline patterns, code examples, integration flows
⚠️ Pitfalls What goes wrong and how to prevent it? Common mistakes, debugging patterns
🚀 What's Next How do child lessons build on this foundation? MCP deep dive, A2A deep dive, Tool Design, Skill Loading

When you move into the MCP child lesson, you'll see exactly how the protocol layer handles tool discovery, authentication, and streaming — concepts that will make immediate sense because you already understand why a protocol layer exists. When you hit the A2A child lesson, the pattern of agent-to-agent communication via standardized task messages will feel like a natural extension of what you've learned here. The Tool Design lesson will give you frameworks for defining tools that are genuinely useful to an LLM (hint: schema quality matters enormously). And Skill Loading will show you how to manage agent capabilities progressively — loading only what's needed, when it's needed.

Each of those lessons is a depth charge dropped into one corner of the three-layer model. This lesson is the model itself.

The Bigger Picture: Why This Moment Matters

It's worth stepping back and appreciating the historical moment we're in. The patterns described in this lesson — protocol-driven agent communication, standardized tool interfaces, progressive skill loading — are not yet universal. Many production AI systems today are still the monolithic scripts we described at the start. The shift toward composable, protocol-driven architectures is happening, but it's not complete.

That means the developers who internalize these patterns now are not just keeping up — they're getting ahead of where the industry is heading. The organizations investing in MCP-compliant tool servers today are building an asset that will work with every future agent framework that adopts the protocol. The teams designing agents with clean skill-loading architectures today are avoiding the refactoring debt that their monolith-building counterparts will face in eighteen months.

🎯 Key Principle: In software, architectural decisions compound. A protocol-first design made today makes every future integration faster and cheaper. A tight-coupling decision made today makes every future change more expensive and risky.

💡 Mental Model: Think of the protocol layer as the USB standard. Before USB, every peripheral needed its own proprietary connector. After USB, any compliant device works with any compliant port. MCP and A2A are trying to be the USB of agentic AI — and the ecosystem is responding accordingly.

With this foundation in place — understanding the shift from monolithic scripts to composable architectures, the role of standardization, and the three-layer mental model of protocols, tools, and skills — you're ready to go deeper. The next section unpacks the protocol layer in detail: what agent protocols actually look like, how they encode the rules of communication, and why the specific design choices in MCP and A2A are the right ones for the problems they solve.

The backbone is in view. Now let's examine each vertebra.

Understanding Agent Protocols: Contracts for Interoperability

Imagine hiring a contractor to renovate your kitchen. Before any work begins, you agree on a set of terms: how requests will be submitted, what format the invoices take, how change orders are communicated, and what constitutes a completed job. Without this shared contract, every interaction becomes a negotiation from scratch — misunderstandings multiply, work gets duplicated, and nothing integrates cleanly. Agent protocols serve exactly this purpose in software systems. They are the explicit, shared contracts that make reliable communication possible between agents, tools, and external services.

In this section, we will build a solid conceptual foundation for what protocols mean in the agentic context, why they matter so profoundly, and how they translate into concrete code patterns you can use immediately.

What Is a Protocol in the Agentic Context?

A protocol in agentic AI is a formally defined set of rules governing how messages are structured, exchanged, and interpreted between two or more communicating parties. This is broader than simply agreeing on JSON versus XML. A complete protocol encompasses three interlocking concerns:

  • 🧠 Message format: The exact schema of requests and responses — which fields are required, what data types they carry, and what constraints apply.
  • 📚 Handshakes: The initialization sequence that establishes a session, negotiates capabilities, and confirms both parties are operating under compatible assumptions.
  • 🔧 Capability negotiation: The mechanism by which one participant advertises what it can do and another party selects only the operations it needs.

Think of it this way: HTTP is a protocol. It specifies not just that responses contain a body, but that they carry status codes, headers, and follow a specific request-response lifecycle. An agent protocol operates at a higher level of abstraction — it defines the semantic contract on top of a transport like HTTP or WebSockets.

┌─────────────────────────────────────────────────────────┐
│                   PROTOCOL LAYERS                       │
├─────────────────────────────────────────────────────────┤
│  Semantic Layer   │  What the message MEANS             │
│                   │  (capability negotiation, intent)   │
├───────────────────┼─────────────────────────────────────┤
│  Schema Layer     │  How the message is STRUCTURED       │
│                   │  (JSON Schema, TypedDict, Pydantic)  │
├───────────────────┼─────────────────────────────────────┤
│  Transport Layer  │  How the message TRAVELS             │
│                   │  (HTTP, WebSocket, stdio, gRPC)      │
└───────────────────┴─────────────────────────────────────┘

A well-designed agent protocol separates these concerns cleanly. An agent should be able to swap out the transport (say, moving from HTTP to a local process pipe) without rewriting its message-handling logic.

🎯 Key Principle: A protocol is not the same as an API. An API describes the surface of a specific service. A protocol describes the language that any conforming service or agent can speak. Protocols enable interoperability across heterogeneous implementations.

Capability Negotiation: The Opening Handshake

One of the most important — and most overlooked — aspects of agent protocols is capability negotiation. Before an agent sends a task request to a tool or another agent, a well-designed protocol provides a way for that tool or agent to announce what it supports.

Consider a weather-checking tool. In one deployment it might support both current conditions and five-day forecasts. In another, stripped-down deployment it might only support current conditions. Without capability negotiation, your orchestrating agent either hard-codes assumptions (brittle) or fails at runtime with a confusing error (worse).

Capability negotiation typically occurs during an initialization handshake, producing a capabilities manifest — a structured document listing supported operations, input schemas, and any version constraints. The calling agent reads this manifest and constructs requests that fit within what the tool actually supports.

💡 Mental Model: Think of the capabilities manifest like a restaurant menu. You don't order a dish and hope the kitchen can make it. You read the menu first. The handshake is walking in and being handed that menu.

Synchronous vs. Asynchronous Protocol Patterns

Not all agent workflows move at the same tempo. A critical design decision in any protocol is whether communication is synchronous (the caller waits for a response before proceeding) or asynchronous (the caller fires a request and continues, collecting the result later).

Synchronous Patterns

In a synchronous exchange, the agent sends a request and blocks — execution pauses until a response arrives or a timeout fires. This is the model most developers are comfortable with from traditional HTTP APIs. It is simple to reason about, simple to debug, and appropriate when:

  • 🎯 The response is needed immediately to determine the next step
  • 🎯 Latency is low and predictable
  • 🎯 The tool operation is fast (sub-second database lookups, validation calls)
Agent                    Tool
  │                        │
  │──── Request ──────────►│
  │      (blocks)          │  ← Agent waits here
  │◄─── Response ──────────│
  │      (continues)       │
  │                        │
Asynchronous Patterns

When tool operations are slow — web scraping, long-running computations, calls to external LLMs, human-in-the-loop reviews — synchronous patterns become liabilities. Asynchronous protocols allow an agent to submit a job, receive a job identifier, and poll or subscribe for the result later. This is essential for:

  • 🔧 Multi-step pipelines where parallel branches can execute concurrently
  • 🔧 Operations with high variance in completion time
  • 🔧 Agent-to-agent communication where the responding agent itself needs to call other tools
Agent                    Tool                    Callback
  │                        │                        │
  │──── Submit Job ────────►│                        │
  │◄─── Job ID: abc123 ─────│                        │
  │      (continues other work)                      │
  │                        │                        │
  │──── Poll: abc123 ──────►│                        │
  │◄─── Status: pending ────│                        │
  │                        │── Result ready ────────►│
  │◄─────────────────────────────── Webhook ─────────│
  │      (processes result)                          │

⚠️ Common Mistake: Treating asynchronous patterns as simply "synchronous with a delay." Async protocols require explicit handling of partial failures, timeout policies, idempotency keys for retries, and result expiry. Bolting async behavior onto a synchronous design leads to fragile polling loops and silent data loss.

💡 Real-World Example: The Model Context Protocol (MCP) supports both synchronous tool calls (for fast, stateless lookups) and streaming responses (for LLM completions that trickle out token by token). Choosing the right pattern for each tool type dramatically simplifies the consuming agent's logic.

How Protocols Decouple Agent Logic from Implementation Details

One of the most powerful benefits of a well-defined protocol is decoupling — the agent's reasoning logic becomes independent of the specific tool or service implementation behind the protocol interface.

Without protocols, an agent that calls a weather service looks like this: it knows the specific HTTP endpoint, the exact parameter names that service uses, its quirky error codes, and its idiosyncratic date format. Change the weather provider, and you rewrite agent logic.

With a protocol, the agent only knows: "I need to call a service that conforms to the WeatherTool protocol. I will send a GetCurrentWeather request with a location field. I will receive a response with temperature, conditions, and a timestamp." Any service implementing that protocol is a valid plug-in replacement.

                        ┌─────────────────────┐
                        │     Orchestrating    │
                        │        Agent         │
                        └──────────┬──────────┘
                                   │
                         Protocol Interface
                         (GetWeather schema)
                                   │
              ┌────────────────────┼─────────────────────┐
              │                    │                      │
   ┌──────────▼──────┐  ┌──────────▼──────┐  ┌──────────▼──────┐
   │  OpenWeather    │  │  WeatherAPI.com  │  │  Mock/Test Stub  │
   │  Adapter        │  │  Adapter         │  │                  │
   └─────────────────┘  └─────────────────┘  └─────────────────┘

This is the adapter pattern applied to agentic systems. The agent never reaches through the protocol boundary. Testing becomes straightforward because a mock stub that speaks the same protocol is a valid drop-in during development.

🎯 Key Principle: Decoupling through protocols is what transforms a tightly coupled agent script into a composable, maintainable system. Every dependency that crosses a protocol boundary is a dependency you can swap, test, and evolve independently.

A Minimal Protocol-Compliant Exchange: Annotated Code

Let's ground these concepts in working code. The following example implements a minimal protocol-compliant request/response exchange between an orchestrating agent and a tool service. We will use Python with Pydantic for schema enforcement — a common and highly practical combination.

Step 1: Define the Protocol Schema
## protocol_schema.py
## Define the message schemas that both agent and tool must conform to.
## This file is the "contract" — both sides import from it.

from pydantic import BaseModel, Field
from typing import Literal, Optional
from datetime import datetime

## ── Request side ──────────────────────────────────────────────

class WeatherRequest(BaseModel):
    """Schema for requesting current weather data."""
    # Protocol version — enables graceful deprecation later
    protocol_version: Literal["1.0"] = "1.0"
    
    # A unique ID the caller generates; used to correlate async responses
    request_id: str = Field(..., description="Caller-generated UUID")
    
    location: str = Field(..., min_length=2, description="City name or lat,lon")
    units: Literal["celsius", "fahrenheit"] = "celsius"


## ── Response side ─────────────────────────────────────────────

class WeatherResponse(BaseModel):
    """Schema for a successful weather tool response."""
    protocol_version: Literal["1.0"] = "1.0"
    
    # Echo the request_id so the caller can match response to request
    request_id: str
    
    temperature: float
    conditions: str
    timestamp: datetime


class ErrorResponse(BaseModel):
    """Schema for protocol-level errors."""
    protocol_version: Literal["1.0"] = "1.0"
    request_id: str
    error_code: str   # e.g., "LOCATION_NOT_FOUND", "RATE_LIMITED"
    message: str

Notice that both the request and response embed a protocol_version field. This small convention pays dividends: when you release version 1.1, you can route requests to different handlers based on declared version, and both old and new clients continue working during the transition.

Step 2: Implement the Tool Service (Protocol Responder)
## weather_tool.py
## The tool service — a FastAPI endpoint that enforces the protocol schema.

from fastapi import FastAPI, HTTPException
from protocol_schema import WeatherRequest, WeatherResponse, ErrorResponse
from datetime import datetime, timezone
import uuid

app = FastAPI()

@app.get("/capabilities")
async def get_capabilities():
    """Capabilities manifest — the handshake endpoint.
    
    Any conforming agent calls this first to discover what
    operations and protocol versions this tool supports.
    """
    return {
        "tool_name": "weather_tool",
        "protocol_versions": ["1.0"],
        "operations": [
            {
                "name": "get_weather",
                "description": "Retrieve current weather for a location",
                "request_schema": WeatherRequest.model_json_schema(),
                "response_schema": WeatherResponse.model_json_schema()
            }
        ]
    }

@app.post("/get_weather", response_model=WeatherResponse)
async def get_weather(request: WeatherRequest):
    """
    Pydantic validates the incoming request against WeatherRequest schema
    *before* this function body runs. Invalid requests are rejected
    automatically with a 422 error — the protocol boundary holds.
    """
    # Simulate a lookup (replace with real API call in production)
    if request.location.lower() == "unknown":
        # Return a structured protocol error, not a raw HTTP 500
        raise HTTPException(
            status_code=404,
            detail=ErrorResponse(
                request_id=request.request_id,
                error_code="LOCATION_NOT_FOUND",
                message=f"No weather data available for '{request.location}'"
            ).model_dump()
        )

    return WeatherResponse(
        request_id=request.request_id,   # echo back for correlation
        temperature=22.5 if request.units == "celsius" else 72.5,
        conditions="Partly cloudy",
        timestamp=datetime.now(timezone.utc)
    )
Step 3: The Agent Consuming the Protocol
## agent.py
## The orchestrating agent — it speaks the protocol, not the implementation.

import httpx
import uuid
from protocol_schema import WeatherRequest, WeatherResponse, ErrorResponse

class WeatherAwareAgent:
    def __init__(self, tool_base_url: str):
        self.tool_base_url = tool_base_url
        self.client = httpx.Client()
        self._negotiated_capabilities = None

    def initialize(self):
        """Perform the capability handshake before any task requests."""
        response = self.client.get(f"{self.tool_base_url}/capabilities")
        response.raise_for_status()
        self._negotiated_capabilities = response.json()
        print(f"Handshake complete. Supported ops: "
              f"{[op['name'] for op in self._negotiated_capabilities['operations']]}")

    def get_weather(self, location: str) -> str:
        """Ask the weather tool for current conditions, using the protocol schema."""
        # Build a protocol-compliant request
        request = WeatherRequest(
            request_id=str(uuid.uuid4()),  # unique ID for correlation
            location=location,
            units="celsius"
        )

        response = self.client.post(
            f"{self.tool_base_url}/get_weather",
            # .model_dump() serializes to a dict; Pydantic ensures it's valid
            json=request.model_dump()
        )

        if response.status_code == 200:
            # Parse and VALIDATE the response against our expected schema
            weather = WeatherResponse.model_validate(response.json())
            return (f"{weather.temperature}°C, {weather.conditions} "
                    f"(as of {weather.timestamp.strftime('%H:%M UTC')})")
        else:
            error = ErrorResponse.model_validate(response.json().get('detail', {}))
            return f"Tool error [{error.error_code}]: {error.message}"

## Usage
if __name__ == "__main__":
    agent = WeatherAwareAgent("http://localhost:8000")
    agent.initialize()             # handshake
    print(agent.get_weather("Berlin"))
    print(agent.get_weather("unknown"))  # demonstrates error protocol

Several design choices here are worth highlighting. The agent validates the response schema just as rigorously as the tool validates the request schema — WeatherResponse.model_validate(response.json()) will raise a ValidationError if the tool returns malformed data. This is bidirectional protocol enforcement. The request_id flows through both request and response, enabling the agent to correlate results in asynchronous variants of this same pattern.

💡 Pro Tip: Always validate both incoming and outgoing messages at protocol boundaries. Developers commonly validate what comes in and trust that what goes out is correct. Runtime schema violations from outbound messages are among the hardest bugs to diagnose in multi-agent systems.

The Role of Schemas and Validation

Schemas are the formal language in which protocol rules are written. A schema answers questions like: Is temperature a required field? Can units be "kelvin"? What happens if request_id is missing?

At runtime, validation is the enforcement mechanism that checks actual messages against their schemas. There are two critical moments when validation earns its keep:

  • 🔒 At ingress (when a message arrives): Reject malformed requests before they corrupt internal state or cause confusing downstream errors.
  • 🔒 At egress (before a message is sent): Catch bugs where your own code constructs an invalid message, failing fast with a clear error rather than sending garbage that confuses the receiver.

Tools like Pydantic (Python), Zod (TypeScript), and JSON Schema validators provide this enforcement. In agentic systems, schema validation is not optional polish — it is the mechanism that keeps the protocol honest.

🤔 Did you know? JSON Schema is transport-agnostic. The same schema document that validates a REST API payload can also validate a message sent over WebSockets, a file written to disk, or a message published to a queue. This is why schema-first protocol design scales naturally across different transports.

Message arrives at agent
        │
        ▼
┌───────────────┐    Fails    ┌──────────────────────────┐
│ Schema        │────────────►│ Reject with structured   │
│ Validation    │             │ error (log + notify)     │
└───────┬───────┘             └──────────────────────────┘
        │ Passes
        ▼
┌───────────────┐
│ Agent Logic   │  ← Only valid, typed data reaches here
└───────────────┘

⚠️ Common Mistake: Using loose schema types (plain dict or Any) at protocol boundaries to "handle anything." This creates a false sense of flexibility. In practice it means errors surface deep inside agent logic as KeyError or AttributeError — far from the boundary where the violation actually occurred, making debugging exponentially harder.

Protocols as Living Documents

A final, often under-appreciated dimension of agent protocols is that they evolve. Systems grow, requirements change, and the protocol contract must accommodate this without breaking existing integrations.

The protocol_version field in our code examples is the seed of a versioning strategy. Common approaches include:

📋 Quick Reference Card: Protocol Versioning Strategies

🔧 Strategy ✅ Best For ⚠️ Watch Out For
🔒 Semantic versioning (1.0, 1.1, 2.0) Most agent protocols Major version breaks need migration paths
📚 Additive-only evolution Long-lived protocols Never remove fields, only add optional ones
🎯 Capability flags Feature toggles Combinatorial explosion of flag combinations
🧠 Negotiated version in handshake Heterogeneous deployments Both sides must implement negotiation logic

💡 Remember: The goal of versioning is not to prevent change — it is to make change survivable. When an agent and a tool negotiate version during the handshake, both sides can evolve independently as long as they share at least one common protocol version.

With this conceptual and practical foundation in place, you are equipped to recognize well-designed protocols when you encounter them, critique implementations that skip important concerns like capability negotiation or bidirectional validation, and write protocol-compliant integrations that decouple your agent logic from implementation details. The next section builds directly on this foundation by exploring the tools and skills that protocols make interoperable.

Tools and Skills: Giving Agents the Ability to Act

An agent without tools is like a brilliant consultant locked in a soundproof room — full of reasoning capability but unable to reach out and change anything. Tools and skills are the mechanisms that bridge the gap between an agent's internal deliberation and real-world action. They transform language model outputs from text completions into API calls, database writes, browser interactions, and orchestrated workflows. Understanding the distinction between these two concepts — and how they compose — is foundational to building agentic systems that are both powerful and predictable.

What Is a Tool?

In agentic AI, a tool is a discrete, callable unit of capability with a well-defined input/output contract. Think of it as a typed function that an agent can invoke at runtime — one that the agent didn't write and doesn't fully control, but can consult and use within the boundaries of its defined interface.

A tool has three essential components:

🔧 A schema — the machine-readable description of what the tool accepts (parameters, types, required vs. optional fields) and what it returns.

🔧 A handler — the actual implementation that executes the capability (makes the HTTP request, queries the database, runs the computation).

🔧 A descriptor — metadata that the agent uses to decide when and why to use the tool, typically including a human-readable name, a natural language description, and sometimes usage examples.

The schema is the contract. It's what allows an agent to call a tool correctly without understanding its internal implementation. This separation of interface from implementation is not a new idea — it's the same principle behind APIs, function signatures, and microservices. What's new in the agentic context is that the caller is a language model selecting tools dynamically based on natural language reasoning, not a programmer making a deliberate function call.

🎯 Key Principle: A tool's schema is a promise — the agent trusts that if it provides valid inputs, it will receive a predictable output. Breaking that contract at runtime is one of the most common sources of agent failures.

What Is a Skill?

If tools are atomic capabilities, skills are composed capabilities. A skill is a higher-order, named, reusable pattern that bundles one or more tools (and potentially sub-agents or other skills) into a coherent unit of work that can be referenced by name.

Consider a concrete example. A web_search function is a tool — it takes a query string and returns a list of results. But a research_topic skill might:

  1. Invoke web_search to find relevant pages
  2. Invoke a fetch_page_content tool to retrieve article bodies
  3. Invoke a summarize_text tool to condense each article
  4. Invoke a synthesize_findings tool (or sub-agent) to produce a unified summary

The skill wraps that multi-step process into a single reusable capability. An orchestrating agent can say "use the research_topic skill" without knowing which tools are involved. This abstraction is powerful: it enables progressive skill loading, where agents start with a lean set of capabilities and acquire more specialized skills as the task demands them.

Tool vs. Skill — Conceptual Hierarchy

┌─────────────────────────────────────────────────┐
│                     SKILL                        │
│            "research_topic"                      │
│  ┌──────────────────────────────────────────┐   │
│  │  Step 1: web_search (tool)               │   │
│  │  Step 2: fetch_page_content (tool)       │   │
│  │  Step 3: summarize_text (tool)           │   │
│  │  Step 4: synthesize_findings (tool)      │   │
│  └──────────────────────────────────────────┘   │
└─────────────────────────────────────────────────┘

         ▲                          ▲
         │                          │
    [Atomic,                  [Composed,
   stateless or               stateful,
    stateful,                 reusable,
   single call]             multi-step]

💡 Mental Model: Think of tools as verbs and skills as phrases. "Search" is a verb (tool). "Research a topic and summarize findings" is a phrase (skill). Agents communicate in phrases, but computers execute verbs.

How Agents Discover and Select Tools at Runtime

One of the most practically important aspects of tool design is discoverability — how does an agent know which tools are available, and how does it decide which one to use for a given subtask?

Modern agentic systems solve this with tool registries: centralized or distributed catalogs that store tool descriptors. At runtime, an agent can query the registry to retrieve available tools, often filtering by category, capability tags, or permissions. The registry returns descriptors, and the agent — typically a language model — reads those descriptors to reason about which tool best matches the current need.

Tool Discovery Flow

  Agent                  Registry               Tool Handler
    │                       │                        │
    │── list_tools(tags) ──>│                        │
    │<── [descriptors] ─────│                        │
    │                       │                        │
    │ (LLM selects tool      │                        │
    │  based on descriptors) │                        │
    │                       │                        │
    │── call_tool(name,      │                        │
    │   params) ────────────────────────────────────>│
    │<── result ────────────────────────────────────-│
    │                       │                        │

The quality of tool descriptions is surprisingly critical. Because a language model selects tools based on natural language reasoning, a vague or misleading description will cause the agent to skip the right tool or misuse it. A description like "does stuff with files" will perform far worse than "reads a file from disk and returns its UTF-8 contents as a string; use when you need to access local file content".

⚠️ Common Mistake — Mistake 1: Writing tool descriptions for human developers rather than for the agent's reasoning process. The agent reads descriptions at inference time and uses them to make selection decisions. Write descriptions that answer: What does this tool do? When should it be used? What does it NOT do?

Defining a Tool: Code, Schema, and Registration

Let's ground these concepts in code. The following example shows a complete tool definition in Python, following a pattern compatible with frameworks like LangChain, OpenAI function calling, or custom MCP-style implementations.

import json
from typing import Any

## ── Step 1: Define the JSON Schema for the tool ──────────────────────────────
## This schema is what the agent sees. It describes accepted inputs and
## is used to validate calls before the handler executes.

WEB_SEARCH_SCHEMA = {
    "name": "web_search",
    "description": (
        "Searches the web for up-to-date information on a given query. "
        "Use this tool when you need current facts, recent events, or "
        "information not available in your training data. "
        "Returns a list of result objects with title, url, and snippet."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "The search query string"
            },
            "max_results": {
                "type": "integer",
                "description": "Maximum number of results to return (default: 5)",
                "default": 5
            }
        },
        "required": ["query"]
    }
}

## ── Step 2: Implement the handler function ────────────────────────────────────
## The handler is the actual implementation. It receives validated parameters
## and returns a result conforming to the expected output shape.

def web_search_handler(query: str, max_results: int = 5) -> list[dict[str, str]]:
    """
    Handler for the web_search tool.
    In production this would call a real search API (e.g., Brave, Serper).
    """
    # Simulate a search response for illustration
    return [
        {
            "title": f"Result {i+1} for: {query}",
            "url": f"https://example.com/result-{i+1}",
            "snippet": f"Relevant content about '{query}' from source {i+1}."
        }
        for i in range(max_results)
    ]

## ── Step 3: Register the tool in a simple registry ───────────────────────────
## A registry maps tool names to their schemas and handlers.
## At runtime, the agent queries this registry to discover available tools.

class ToolRegistry:
    def __init__(self):
        self._tools: dict[str, dict] = {}

    def register(self, schema: dict, handler: callable) -> None:
        """Register a tool with its schema and handler."""
        name = schema["name"]
        self._tools[name] = {"schema": schema, "handler": handler}
        print(f"[Registry] Tool registered: {name}")

    def list_schemas(self) -> list[dict]:
        """Return all schemas — this is what gets sent to the LLM."""
        return [entry["schema"] for entry in self._tools.values()]

    def call(self, name: str, params: dict) -> Any:
        """Execute a tool by name with provided parameters."""
        if name not in self._tools:
            raise ValueError(f"Unknown tool: {name}")
        handler = self._tools[name]["handler"]
        return handler(**params)


## ── Usage ─────────────────────────────────────────────────────────────────────
registry = ToolRegistry()
registry.register(WEB_SEARCH_SCHEMA, web_search_handler)

## The agent would receive this list and reason over it
available_tools = registry.list_schemas()
print(json.dumps(available_tools[0]["description"], indent=2))

## The agent selects web_search and constructs a call
result = registry.call("web_search", {"query": "agentic AI frameworks 2024", "max_results": 3})
print(result)

This pattern separates three concerns that should always remain distinct: what the tool does (handler), how it's described to the agent (schema), and how it's discovered (registry). When these concerns are mixed — for example, when the tool name and behavior are tightly coupled in a single monolithic class — systems become brittle and hard to evolve.

💡 Pro Tip: Version your tool schemas. As agent systems mature, tool interfaces change. An agent trained or prompted on web_search_v1 schema should not silently receive web_search_v2 behavior. Treat tool schemas like API versions.

Stateless vs. Stateful Tools

Not all tools are equal in terms of their side effects, and this distinction has significant implications for how agents should use them.

A stateless tool produces outputs that depend only on its inputs and have no lasting effect on any external system. The same input will always produce the same output (or at least, produce an equivalent output — web search results may vary, but the act of searching changes nothing). Examples include:

  • 🔍 Web search
  • 🧮 Mathematical computation
  • 📄 Text parsing or format conversion
  • 🌡️ Unit conversion

A stateful tool reads from or writes to some persistent or shared state. Calling it changes the world. Examples include:

  • 💾 Memory write (adds to a vector store or key-value store)
  • 📧 Send email
  • 🗄️ Database insert or update
  • 📁 File write
  • 🔧 Execute code in a persistent environment

The implications of this distinction are non-trivial:

Stateless vs. Stateful Tool Behavior

  STATELESS TOOL
  ┌───────────────────────────────────────────────┐
  │  Input ──> [Tool Handler] ──> Output           │
  │                                                │
  │  • Retryable without risk                     │
  │  • Easy to test and mock                      │
  │  • Safe to call in parallel                   │
  │  • No cleanup required on failure             │
  └───────────────────────────────────────────────┘

  STATEFUL TOOL
  ┌───────────────────────────────────────────────┐
  │  Input ──> [Tool Handler] ──> Output           │
  │                    │                           │
  │                    ▼                           │
  │           [External State Changed]             │
  │                                                │
  │  • Retrying may cause duplicate side effects  │
  │  • Requires idempotency design                │
  │  • May need rollback on failure               │
  │  • Audit logging is critical                  │
  └───────────────────────────────────────────────┘

⚠️ Common Mistake — Mistake 2: Treating stateful tools the same as stateless ones in retry logic. If an agent retries a failed send_email tool call because it didn't receive a confirmation, the recipient may receive duplicate emails. Always design stateful tools to be idempotent where possible, or include explicit deduplication keys.

The following example illustrates a stateful memory tool designed with idempotency in mind:

import hashlib
import time
from dataclasses import dataclass, field

## ── A simple in-memory store to represent an agent's working memory ───────────

@dataclass
class MemoryStore:
    entries: dict[str, dict] = field(default_factory=dict)

    def write(self, content: str, idempotency_key: str | None = None) -> str:
        """
        Write content to memory. If idempotency_key is provided and an entry
        with that key already exists, the write is skipped (no duplicate).
        Returns the key used for storage.
        """
        # Generate a key from content hash if none provided
        key = idempotency_key or hashlib.sha256(content.encode()).hexdigest()[:16]

        if key in self.entries:
            print(f"[Memory] Skipping duplicate write for key: {key}")
            return key  # Idempotent: same key, no-op

        self.entries[key] = {
            "content": content,
            "timestamp": time.time()
        }
        print(f"[Memory] Wrote entry with key: {key}")
        return key

    def read(self, key: str) -> str | None:
        entry = self.entries.get(key)
        return entry["content"] if entry else None


## ── Tool schema for the stateful memory_write tool ────────────────────────────

MEMORY_WRITE_SCHEMA = {
    "name": "memory_write",
    "description": (
        "Writes a piece of information to the agent's persistent memory. "
        "Use this to store facts, decisions, or context that should be "
        "available in future turns. Returns a key that can be used to "
        "retrieve the memory later. This tool MODIFIES STATE — avoid "
        "calling it multiple times with identical content."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "content": {
                "type": "string",
                "description": "The information to store in memory"
            },
            "idempotency_key": {
                "type": "string",
                "description": "Optional key to prevent duplicate writes on retry"
            }
        },
        "required": ["content"]
    }
}

## The description explicitly warns the agent about the stateful nature —
## this is a deliberate design choice to influence agent behavior.

memory = MemoryStore()

def memory_write_handler(content: str, idempotency_key: str | None = None) -> dict:
    key = memory.write(content, idempotency_key)
    return {"key": key, "status": "written"}


## Demonstrating idempotency
first_write = memory_write_handler("The user prefers dark mode", "pref-display-mode")
second_write = memory_write_handler("The user prefers dark mode", "pref-display-mode")
## Second call is a no-op — safe to retry

🤔 Did you know? Some agent frameworks differentiate between read tools (which access state but don't modify it) and write tools (which modify state). This three-way split — stateless, read-stateful, write-stateful — maps directly to the CQRS (Command Query Responsibility Segregation) pattern from backend architecture. The same principles that make distributed systems robust also apply to agent tool design.

Connecting Tools and Skills: The Bigger Picture

At this point, it's worth stepping back and seeing how tools and skills interact in a running agent system. Tools are the atomic units. Skills compose them. Protocols (covered in section 2) govern how agents discover skills and invoke tools across service boundaries.

Full Capability Stack

  ┌───────────────────────────────────────────────────────┐
  │                  ORCHESTRATING AGENT                  │
  │           (reasons, plans, selects skills)            │
  └───────────────────┬───────────────────────────────────┘
                      │ selects
  ┌───────────────────▼───────────────────────────────────┐
  │                    SKILL LAYER                        │
  │  research_topic  |  draft_email  |  analyze_data      │
  │  (named, reusable, multi-step capabilities)           │
  └────────┬──────────────────────────────┬──────────────-┘
           │ invokes                      │ invokes
  ┌────────▼──────────┐        ┌──────────▼──────────────┐
  │    TOOL LAYER      │        │      TOOL LAYER          │
  │  web_search        │        │  memory_write            │
  │  fetch_content     │        │  send_email              │
  │  summarize_text    │        │  format_template         │
  └────────────────────┘        └─────────────────────────┘
           │                              │
  ┌────────▼──────────────────────────────▼──────────────-┐
  │              EXTERNAL SYSTEMS & APIS                  │
  │   Search API | Vector DB | SMTP | File System | etc.  │
  └───────────────────────────────────────────────────────┘

This layered architecture is what allows agentic systems to scale. A new tool can be added to the registry without touching the skill layer. A new skill can be composed from existing tools without touching the agent's core reasoning loop. The agent's reasoning loop can be upgraded without changing tools or skills.

📋 Quick Reference Card:

🔧 Tool 🧠 Skill
📏 Granularity Atomic, single function Composed, multi-step
🔄 Reusability Called directly Referenced by name, reused across agents
📋 Definition Schema + handler Workflow pattern + tool references
🔒 State Stateless or stateful Often stateful across steps
👁️ Agent visibility Full schema visible Exposed as named capability
🎯 Selection LLM picks from registry Agent selects by capability name

Implications for System Design

Understanding tools and skills as first-class architectural concepts — not just implementation details — changes how you approach building agentic systems.

Wrong thinking: Tools are just helper functions I'll add as needed.

Correct thinking: Tools are the interface between agent reasoning and the real world. Their schemas, descriptions, and stateful properties are architectural decisions that affect reliability, safety, and agent performance.

💡 Real-World Example: GitHub Copilot Workspace uses a skill-like pattern where high-level tasks ("fix this bug", "add this feature") are decomposed into tool invocations (read file, write file, run tests, search codebase). The skill layer encodes knowledge about how developers approach tasks; the tool layer encodes what operations are available. This separation is what allows the system to handle diverse task types without retraining the underlying model.

🧠 Mnemonic: Schema + Handler + Descriptor = SHD = "Should Handle, Describe" — every tool should handle a specific job, described clearly enough for an agent to reason about it.

As you move into the subsequent sections of this lesson, you'll see these concepts applied in realistic pipelines — tools wired to protocols like MCP, skills registered in A2A-compatible catalogs, and progressive loading patterns that give agents exactly the capabilities they need, when they need them. The foundation you've built here — the distinction between tools and skills, the importance of schemas and descriptions, and the implications of statefulness — is the lens through which all of that material becomes coherent.

Practical Patterns: Wiring Protocols, Tools & Skills Together

Understanding protocols, tools, and skills in the abstract is useful — but the real insight comes when you see them work together in a live pipeline. This section walks through realistic implementation patterns that connect all three layers: an agent receives a task, consults its registry of available tools, resolves which skill applies, and executes a protocol-compliant call that returns a structured result. By the end, you'll have runnable code patterns you can adapt to your own agentic systems.

The End-to-End Picture: What Happens When an Agent Gets a Task?

Before writing a single line of code, it helps to trace the full lifecycle of a task through an agentic pipeline. Imagine a user sends the message: "Summarize the latest support tickets and email the report to the team." What actually happens next?

User Request
     │
     ▼
┌─────────────────────────────────────────────────────┐
│                  AGENT ORCHESTRATOR                 │
│                                                     │
│  1. Parse intent  →  2. Select skill                │
│                            │                        │
│                    ┌───────▼────────┐               │
│                    │  Skill: Report │               │
│                    │  Generator     │               │
│                    └───────┬────────┘               │
│                            │                        │
│              3. Resolve required tools              │
│          ┌─────────────────┼─────────────────┐      │
│          ▼                 ▼                 ▼      │
│   [ticket_reader]   [summarizer_llm]  [email_sender]│
│          │                 │                 │      │
│  4. Execute via protocol-compliant calls            │
│          │                 │                 │      │
│          └─────────────────┴─────────────────┘      │
│                            │                        │
│              5. Normalize & return result           │
└─────────────────────────────────────────────────────┘
     │
     ▼
 Structured Response to User

Each step in this flow maps to a concrete engineering decision. The orchestrator parses intent (often via an LLM), then looks up which skill covers that intent. The skill declares its tool dependencies, the orchestrator resolves those from a tool registry, and each tool is invoked via a protocol-compliant call — meaning the request and response shapes are defined ahead of time by a schema. The results are normalized and composed into a final output.

This layered design is the key architectural insight: skills are composites, tools are atomic, and protocols are the contracts that make everything interoperable.

Building a Tool Registry

A tool registry is the mechanism by which an agent discovers what it can do. Rather than hard-coding tool invocations into agent logic, you register tools with metadata — their name, description, input schema, and callable — so the agent can discover and invoke them dynamically at runtime.

Here's a clean, practical implementation:

import json
from typing import Any, Callable, Dict, Optional
from dataclasses import dataclass, field

@dataclass
class ToolDefinition:
    """Metadata + callable for a single agent tool."""
    name: str
    description: str
    input_schema: Dict[str, Any]   # JSON Schema describing expected inputs
    output_schema: Dict[str, Any]  # JSON Schema describing expected outputs
    handler: Callable              # The actual function to execute
    tags: list[str] = field(default_factory=list)  # e.g. ["read", "tickets"]


class ToolRegistry:
    """Central registry for discovering and invoking agent tools."""

    def __init__(self):
        self._tools: Dict[str, ToolDefinition] = {}

    def register(self, tool: ToolDefinition) -> None:
        """Register a tool under its name."""
        if tool.name in self._tools:
            raise ValueError(f"Tool '{tool.name}' is already registered.")
        self._tools[tool.name] = tool
        print(f"[Registry] Registered tool: {tool.name}")

    def discover(self, tag: Optional[str] = None) -> list[ToolDefinition]:
        """Return all tools, optionally filtered by tag."""
        tools = list(self._tools.values())
        if tag:
            tools = [t for t in tools if tag in t.tags]
        return tools

    def invoke(self, tool_name: str, inputs: Dict[str, Any]) -> Dict[str, Any]:
        """
        Invoke a registered tool with the given inputs.
        Returns a protocol-shaped response: {ok, result, error}.
        """
        if tool_name not in self._tools:
            return {"ok": False, "result": None,
                    "error": f"Unknown tool: '{tool_name}'"}

        tool = self._tools[tool_name]
        try:
            result = tool.handler(**inputs)
            return {"ok": True, "result": result, "error": None}
        except TypeError as e:
            return {"ok": False, "result": None,
                    "error": f"Input mismatch for '{tool_name}': {e}"}
        except Exception as e:
            return {"ok": False, "result": None,
                    "error": f"Tool '{tool_name}' failed: {e}"}


## --- Wire up a sample tool ---

def fetch_support_tickets(since_days: int = 7) -> list[dict]:
    """Stub: return recent support tickets."""
    return [
        {"id": "T-101", "title": "Login broken", "severity": "high"},
        {"id": "T-102", "title": "Slow dashboard", "severity": "medium"},
    ]


registry = ToolRegistry()
registry.register(ToolDefinition(
    name="ticket_reader",
    description="Fetches recent support tickets from the helpdesk system.",
    input_schema={
        "type": "object",
        "properties": {"since_days": {"type": "integer", "default": 7}},
    },
    output_schema={
        "type": "array",
        "items": {"type": "object"}
    },
    handler=fetch_support_tickets,
    tags=["read", "tickets"]
))

## Invoke dynamically — the agent doesn't need to know the function signature
response = registry.invoke("ticket_reader", {"since_days": 3})
print(json.dumps(response, indent=2))
## Output: {"ok": true, "result": [{...}, {...}], "error": null}

Notice a few deliberate choices here. First, every invocation returns a protocol-shaped envelope{ok, result, error} — so the agent loop always has a consistent structure to handle, regardless of which tool ran. Second, the discover() method enables agents to query the registry by tag, which is how an orchestrator can ask "show me all tools tagged 'tickets'" when resolving a skill's dependencies. Third, errors are caught and wrapped rather than raised — an agent loop must never crash because one tool misbehaved.

💡 Pro Tip: Storing input_schema and output_schema as JSON Schema objects isn't just documentation. You can use a library like jsonschema to validate inputs before calling the handler, giving you an automatic guard against malformed LLM-generated arguments.

Handling Responses and Errors in the Agent Loop

A robust agent loop is the engine that drives an agentic system forward. It repeatedly reads the current state, selects an action (tool invocation), handles the response, and decides whether to continue or return. The tricky part is that tools fail — networks drop, APIs rate-limit, schemas mismatch — and your loop must handle all of this gracefully without halting the whole workflow.

from typing import Optional

MAX_ITERATIONS = 10  # Safety valve: prevent infinite loops

def agent_loop(
    task: str,
    registry: ToolRegistry,
    skill_plan: list[dict],  # e.g. [{"tool": "ticket_reader", "inputs": {...}}, ...]
    trace_log: Optional[list] = None
) -> dict:
    """
    Execute a sequence of tool calls defined by a skill plan.
    Returns a summary dict with results and any errors encountered.
    """
    if trace_log is None:
        trace_log = []

    results = {}
    errors = []

    for step_index, step in enumerate(skill_plan):
        if step_index >= MAX_ITERATIONS:
            errors.append("Max iterations reached — aborting loop.")
            break

        tool_name = step.get("tool")
        inputs = step.get("inputs", {})

        # --- Tracing: log the outgoing call ---
        trace_entry = {
            "step": step_index,
            "tool": tool_name,
            "inputs": inputs,
            "response": None
        }

        response = registry.invoke(tool_name, inputs)
        trace_entry["response"] = response
        trace_log.append(trace_entry)

        if not response["ok"]:
            error_msg = response["error"]
            print(f"[AgentLoop] Step {step_index} failed: {error_msg}")

            # Strategy: skip non-critical steps, abort on critical ones
            if step.get("critical", False):
                errors.append(f"Critical tool '{tool_name}' failed: {error_msg}")
                break  # Cannot continue without this result
            else:
                errors.append(f"Non-critical tool '{tool_name}' skipped: {error_msg}")
                continue  # Try the next step anyway

        # Store result under the tool name for downstream steps to reference
        results[tool_name] = response["result"]
        print(f"[AgentLoop] Step {step_index} OK: {tool_name}")

    return {
        "task": task,
        "results": results,
        "errors": errors,
        "trace": trace_log
    }


## --- Run it ---
plan = [
    {"tool": "ticket_reader", "inputs": {"since_days": 7}, "critical": True},
    {"tool": "nonexistent_tool", "inputs": {}, "critical": False},  # Will fail gracefully
]

trace = []
outcome = agent_loop("Summarize recent tickets", registry, plan, trace)
print(f"Results: {outcome['results']}")
print(f"Errors: {outcome['errors']}")
print(f"Trace steps logged: {len(outcome['trace'])}")

The critical flag on each step is a simple but powerful pattern. It lets the skill author declare "if the ticket reader fails, stop everything" versus "if the email footer tool fails, just continue." This avoids the common mistake of treating all tool failures as equally fatal — or equally ignorable.

⚠️ Common Mistake: Never let your agent loop run without a maximum iteration guard. An LLM planner in a dynamic loop can generate new steps based on previous results, and without a cap, a circular dependency or a malfunctioning tool can spin the loop indefinitely. Always set MAX_ITERATIONS.

Composing Tools Into a Reusable Skill

A skill is a named, reusable capability that wraps one or more tools behind a clean interface. Where a tool is atomic — it does exactly one thing — a skill is compositional. It defines which tools are needed, validates the inputs it receives, orchestrates the tool calls in order, and normalizes the output into a shape that callers can depend on.

Think of a skill like a well-tested function in a software library: you don't need to know how it works internally; you just need to know its contract.

from dataclasses import dataclass

@dataclass
class SkillResult:
    """Normalized output for any skill execution."""
    skill_name: str
    success: bool
    output: Any
    errors: list[str]
    trace: list[dict]


class SupportReportSkill:
    """
    Skill: Generate a support ticket summary report.
    Required tools: ticket_reader, text_summarizer
    """

    REQUIRED_INPUTS = {"since_days", "max_tickets"}

    def __init__(self, registry: ToolRegistry):
        self.registry = registry
        self.name = "support_report"

    def _validate_inputs(self, inputs: dict) -> list[str]:
        """Return a list of validation error messages (empty = valid)."""
        missing = self.REQUIRED_INPUTS - inputs.keys()
        errors = []
        if missing:
            errors.append(f"Missing required inputs: {missing}")
        if "since_days" in inputs and not isinstance(inputs["since_days"], int):
            errors.append("'since_days' must be an integer.")
        return errors

    def _normalize_output(self, tickets: list, summary: str) -> dict:
        """Produce a consistent output shape regardless of internal variation."""
        return {
            "ticket_count": len(tickets),
            "tickets": tickets,
            "summary": summary,
            "report_version": "1.0"
        }

    def execute(self, inputs: dict) -> SkillResult:
        trace = []
        validation_errors = self._validate_inputs(inputs)

        if validation_errors:
            return SkillResult(
                skill_name=self.name,
                success=False,
                output=None,
                errors=validation_errors,
                trace=trace
            )

        # Step 1: Fetch tickets
        ticket_response = self.registry.invoke(
            "ticket_reader", {"since_days": inputs["since_days"]}
        )
        trace.append({"tool": "ticket_reader", "response": ticket_response})

        if not ticket_response["ok"]:
            return SkillResult(self.name, False, None,
                               [ticket_response["error"]], trace)

        tickets = ticket_response["result"][:inputs["max_tickets"]]

        # Step 2: Summarize (stubbed here; real impl calls an LLM tool)
        ticket_titles = ", ".join(t["title"] for t in tickets)
        summary = f"Recent issues include: {ticket_titles}."

        return SkillResult(
            skill_name=self.name,
            success=True,
            output=self._normalize_output(tickets, summary),
            errors=[],
            trace=trace
        )


## --- Use the skill ---
skill = SupportReportSkill(registry)
result = skill.execute({"since_days": 7, "max_tickets": 5})

if result.success:
    print(f"Report: {result.output['summary']}")
else:
    print(f"Skill failed: {result.errors}")

The _normalize_output method deserves attention. In real pipelines, different tools return results in slightly different shapes — one API returns {"tickets": [...]}, another returns a flat list. Normalization inside the skill is where you absorb that variability and expose a stable contract to whoever called the skill. This is exactly what makes skills reusable: callers never have to care about the tool internals.

💡 Mental Model: Think of a skill as a USB-C adapter. The tools on the inside might have wildly different connector types; the skill presents a single, standard port to the outside world.

🎯 Key Principle: Input validation at the skill boundary is non-negotiable. The inputs to a skill often come from an LLM planner, which can hallucinate field names or types. Catching that early — before any tool is called — saves you from debugging mysterious tool failures downstream.

Tracing and Observability

Agentic workflows are notoriously hard to debug. When something goes wrong — the wrong tool was selected, an input was malformed, a response was misinterpreted — you need a record of every decision and every protocol message that flowed through the system. This is the job of tracing and observability hooks.

A practical tracing approach doesn't require a full observability platform on day one. Start by building a lightweight trace log that captures four things for every tool call: the tool name, the inputs, the response envelope, and a timestamp. The agent loop code above already seeds this pattern with the trace_log list.

For production systems, you extend this by emitting structured log events that an external system (like OpenTelemetry, LangSmith, or a simple ELK stack) can ingest:

import time
import uuid
import logging
import json

## Configure structured logging
logging.basicConfig(
    level=logging.INFO,
    format='{"time": "%(asctime)s", "level": "%(levelname)s", "msg": %(message)s}'
)
logger = logging.getLogger("agent.trace")


class TracedToolRegistry(ToolRegistry):
    """
    Extends ToolRegistry with structured trace emission on every invocation.
    Drop-in replacement for ToolRegistry — no agent code changes required.
    """

    def invoke(self, tool_name: str, inputs: Dict[str, Any]) -> Dict[str, Any]:
        call_id = str(uuid.uuid4())[:8]  # Short ID for log correlation
        start_time = time.monotonic()

        # Log the outgoing tool call
        logger.info(json.dumps({
            "event": "tool_call_start",
            "call_id": call_id,
            "tool": tool_name,
            "inputs": inputs
        }))

        response = super().invoke(tool_name, inputs)

        duration_ms = round((time.monotonic() - start_time) * 1000, 2)

        # Log the result
        log_event = {
            "event": "tool_call_end",
            "call_id": call_id,
            "tool": tool_name,
            "ok": response["ok"],
            "duration_ms": duration_ms,
            "error": response.get("error")
        }
        if response["ok"]:
            logger.info(json.dumps(log_event))
        else:
            logger.warning(json.dumps(log_event))

        return response


## Replace registry with traced version — zero changes to agent or skill code
traced_registry = TracedToolRegistry()
traced_registry.register(ToolDefinition(
    name="ticket_reader",
    description="Fetches recent support tickets.",
    input_schema={},
    output_schema={},
    handler=fetch_support_tickets,
    tags=["read", "tickets"]
))

## Every invoke now auto-emits structured logs
traced_registry.invoke("ticket_reader", {"since_days": 5})

This design uses the decorator patternTracedToolRegistry extends ToolRegistry and wraps its invoke method — so you gain full observability without touching any agent or skill code. The call_id field ties the start and end log entries together, which is essential when tool calls overlap in concurrent workflows.

🤔 Did you know? The correlation ID pattern (using a shared ID to link related log events) comes directly from distributed systems engineering, where it's been standard practice for decades. Agentic AI systems are, in effect, distributed systems where the nodes are LLM calls and tool invocations rather than microservices.

Putting It All Together

Here's how the three layers — registry, skill, and agent loop — compose into a cohesive pipeline:

Task Input
    │
    ▼
[Agent selects skill: "support_report"]
    │
    ▼
[Skill validates inputs]
    │
    ├── invalid? → return SkillResult(success=False)
    │
    ▼
[Skill resolves tools from TracedToolRegistry]
    │
    ├── "ticket_reader"  →  Protocol call  →  {ok, result, error}
    │                             │
    │                     [Trace log emitted]
    │
    ├── "text_summarizer"  →  Protocol call  →  {ok, result, error}
    │
    ▼
[Skill normalizes output → SkillResult(success=True)]
    │
    ▼
[Agent loop records result, advances plan]
    │
    ▼
Final response returned to user

Every boundary in this diagram is a place where a contract is enforced: the skill's input schema, the tool registry's invocation envelope, and the normalized SkillResult. These contracts are what allow you to swap implementations — replace the stub ticket_reader with a real Jira API client, or replace the trace logger with an OpenTelemetry exporter — without touching the rest of the pipeline.

📋 Quick Reference Card: The Three Layers and Their Responsibilities

Layer 🔧 What it owns 📋 Key contract ⚠️ Failure mode
🗂️ Tool Single atomic action Input/output schema Raw exception, unhandled
🛠️ Skill Composition + validation SkillResult envelope Mis-validation, no normalization
🔁 Agent Loop Sequencing + error policy Protocol envelope {ok, result, error} Infinite loop, swallowed errors

💡 Real-World Example: The pattern described here closely mirrors how systems built on the Model Context Protocol (MCP) work. MCP defines a standard JSON-RPC-based message format (the protocol envelope), tools expose themselves via server descriptors (the registry), and agent frameworks compose those tools into workflows (the skill layer). By learning this pattern now, you're building intuition directly applicable to MCP-based implementations.

The patterns in this section — a metadata-driven registry, a graceful agent loop, a validating and normalizing skill, and a drop-in tracing layer — are not theoretical. They represent the engineering vocabulary you'll encounter repeatedly as you explore MCP, A2A, and production-grade agentic frameworks. Each piece is independently useful, but together they form a complete, debuggable, extensible pipeline for agent-powered software.

Common Pitfalls When Working with Agent Protocols and Tools

Building agentic systems that use protocols and tools feels empowering at first. You define a few tools, wire up a protocol, and watch your agent accomplish tasks autonomously. Then, in production, things start going sideways: the agent loops endlessly, silently swallows errors, or makes changes it was never supposed to make. Most of these failures trace back not to the model itself, but to how the surrounding infrastructure was designed. This section catalogs the most common mistakes developers make when building protocol-driven, tool-equipped agents — and more importantly, shows you exactly how to avoid them.

🎯 Key Principle: Protocol and tool quality is a force multiplier. Good design amplifies your agent's effectiveness; poor design amplifies its failure modes.


Pitfall 1: Overly Broad Tool Definitions

One of the most tempting shortcuts in agentic development is creating a single, powerful tool that can "do everything" in a domain. A run_query tool that accepts arbitrary SQL, a file_manager tool that can read, write, delete, and move any path, or a send_request tool that fires off any HTTP call — these feel like elegant abstractions. In practice, they are reliability and safety liabilities.

Overly broad tool definitions create several interlocking problems. First, the language model receiving the tool description has no meaningful guidance about what is appropriate to invoke. When a tool is described as execute_database_command(sql: string), the model has no signal distinguishing a safe SELECT from a destructive DROP TABLE. Second, broad tools make it nearly impossible to apply fine-grained authorization or audit logging. You cannot selectively allow read access without also allowing write access when they share a single entry point.

Wrong thinking: "One powerful tool is simpler to maintain and gives the agent maximum flexibility."

Correct thinking: "Each tool should represent exactly one well-scoped capability, with a name and description that makes its boundaries unmistakable."

Consider this contrast:

## ❌ Overly broad — dangerous and ambiguous
tools = [
    {
        "name": "database_tool",
        "description": "Execute any SQL command against the production database.",
        "parameters": {
            "type": "object",
            "properties": {
                "sql": {"type": "string", "description": "Any SQL to run"}
            },
            "required": ["sql"]
        }
    }
]

## ✅ Scoped tools — safe and intention-clear
tools = [
    {
        "name": "search_orders",
        "description": "Search customer orders by status or date range. Read-only.",
        "parameters": {
            "type": "object",
            "properties": {
                "status": {"type": "string", "enum": ["pending", "shipped", "cancelled"]},
                "since_date": {"type": "string", "format": "date"}
            }
        }
    },
    {
        "name": "cancel_order",
        "description": "Cancel a specific order by ID. Requires order to be in 'pending' status.",
        "parameters": {
            "type": "object",
            "properties": {
                "order_id": {"type": "string"}
            },
            "required": ["order_id"]
        }
    }
]

The scoped versions accomplish the same real-world tasks, but each tool's name, description, and parameter constraints act as a behavioral contract — both for the model choosing the tool and for the developer auditing its use. Notice also that search_orders communicates read-only intent, which can be enforced in the implementation and logged separately from write operations.

⚠️ Common Mistake: Mistake 1 — Describing a tool as "flexible" or "general-purpose" in its description. This signals to the model that anything goes, increasing the probability of unintended invocations.

💡 Pro Tip: Apply the single-responsibility principle to tools just as you would to functions or microservices. If you find yourself adding conditional logic inside a tool handler based on what the model passed in, that's a sign the tool should be split.


Pitfall 2: Missing or Inconsistent Schema Validation

Protocol exchanges between agents and tools depend entirely on structured data. When that structure goes unvalidated, failures become silent — the agent receives a malformed response, doesn't know it, and proceeds confidently in the wrong direction.

Schema validation is the practice of asserting that every message entering or leaving a tool conforms to an agreed-upon structure before any business logic runs. Many developers skip this, reasoning that the language model "should" produce well-formed arguments. This assumption breaks in surprising ways: models occasionally omit required fields, pass integers where strings are expected, or construct nested objects with the wrong key names.

Agent ──► Tool Call ──► [NO VALIDATION] ──► Business Logic ──► Silent Wrong Result
                                                                         │
                                                              Agent reads bad data
                                                                         │
                                                              Next tool call is
                                                              built on bad state

Contrast this with a validated pipeline:

Agent ──► Tool Call ──► [SCHEMA VALIDATION] ──► Pass: Business Logic ──► Result
                               │
                             Fail: Structured Error Response
                               │
                        Agent receives clear error,
                        can self-correct on next turn

Here's what runtime validation looks like in practice using Python's pydantic library — a natural fit for tool input validation:

from pydantic import BaseModel, Field, ValidationError
from typing import Literal
import json

class CancelOrderInput(BaseModel):
    order_id: str = Field(..., min_length=6, max_length=20, pattern=r'^ORD-\d+$')
    reason: Literal["customer_request", "fraud", "inventory"] = "customer_request"

def cancel_order_tool(raw_arguments: dict) -> dict:
    """
    Tool handler with strict input validation.
    Returns a protocol-compliant response in all cases.
    """
    try:
        # Validate and parse — raises ValidationError if input is malformed
        args = CancelOrderInput(**raw_arguments)
    except ValidationError as e:
        # Return a structured error the agent can interpret
        return {
            "success": False,
            "error_code": "INVALID_INPUT",
            "error_detail": e.errors(),  # Pydantic gives precise field-level errors
            "recoverable": True           # Signal: agent can retry with corrected input
        }

    # Only reaches here if input is valid
    result = perform_cancellation(args.order_id, args.reason)
    return {"success": True, "cancelled_order_id": args.order_id, "result": result}

The recoverable flag in the error response is worth highlighting. It provides the agent with actionable metadata: a True value signals that retrying with corrected input is appropriate, while False might indicate a downstream system failure where retrying would be futile.

⚠️ Common Mistake: Mistake 2 — Logging validation errors internally but returning a generic {"status": "ok"} to the agent anyway. The agent has no idea something went wrong and will continue building on a broken foundation.

🤔 Did you know? Many agentic frameworks route all tool outputs back through the model's context window. A silent validation failure that returns partial data doesn't just affect one step — it contaminates every subsequent reasoning step that builds on that context.


Pitfall 3: Tight Coupling Between Agent Logic and Tool Implementations

Tight coupling occurs when your agent's reasoning logic, prompt templates, or orchestration code directly reference specific implementation details of a tool — its internal function names, the exact shape of its private data structures, or assumptions about where it runs. This pattern feels harmless early in development when you control every component. It becomes a serious liability the moment a tool needs to change.

Imagine your agent prompt includes instructions like: "When calling search_orders, note that the results array will always have a legacy_id field you should use for follow-up calls." If the search_orders tool is ever refactored to remove legacy_id, every agent that was instructed to rely on it breaks silently.

🎯 Key Principle: Agents should depend on a tool's declared interface (its schema and documented return contract), never on implementation details that aren't part of that contract.

The architectural solution is to enforce a tool abstraction layer — a stable interface that insulates agent logic from implementation churn:

┌─────────────────────────────────────────────────────────┐
│                    AGENT LOGIC LAYER                    │
│  (prompts, orchestration, reasoning — references only   │
│   stable tool names and documented response schemas)    │
└───────────────────────┬─────────────────────────────────┘
                        │ calls via stable interface
┌───────────────────────▼─────────────────────────────────┐
│                  TOOL INTERFACE LAYER                   │
│  (canonical schema, versioned contracts, error codes)   │
└───────────────────────┬─────────────────────────────────┘
                        │ delegates to
┌───────────────────────▼─────────────────────────────────┐
│               TOOL IMPLEMENTATION LAYER                 │
│  (actual database calls, API integrations, file I/O —   │
│   free to change without touching layers above)         │
└─────────────────────────────────────────────────────────┘

In practice, this means defining tool response shapes as explicit contracts and testing against them rather than testing against implementation internals. If search_orders promises to always return {"orders": [...], "total_count": int}, that contract is what your agent logic should depend on — and what your integration tests should verify.

💡 Real-World Example: Teams using MCP (Model Context Protocol) benefit from this naturally because MCP tool definitions are declared separately from their implementations. However, even with MCP, developers sometimes hardcode assumptions about response field names in their system prompts. When the underlying implementation changes the field name from id to order_id, the protocol schema gets updated, but the system prompt doesn't — and the agent starts misinterpreting responses.


Pitfall 4: Neglecting Error Codes and Fallback Behavior

Agents operating in agentic loops make sequential decisions: each tool result informs the next action. When a tool fails without returning a meaningful, structured error, the agent has no signal to work with. In the absence of signal, language models hallucinate recovery — they invent plausible-sounding next steps that have no grounding in reality, often making the situation worse.

Structured error codes are your primary mechanism for keeping agents on the rails when things go wrong. An error response that simply returns {"error": "something went wrong"} tells the agent almost nothing. Compare this to a rich error response:

## ❌ Unstructured error — agent has no actionable information
def fetch_customer_unstructured(customer_id: str) -> dict:
    try:
        return database.get_customer(customer_id)
    except Exception:
        return {"error": "something went wrong"}  # Agent will hallucinate next steps

## ✅ Structured error — agent knows exactly what happened and what to do
def fetch_customer_structured(customer_id: str) -> dict:
    try:
        customer = database.get_customer(customer_id)
        if customer is None:
            return {
                "success": False,
                "error_code": "CUSTOMER_NOT_FOUND",
                "error_message": f"No customer with ID '{customer_id}' exists.",
                "recoverable": False,       # Don't retry — the record doesn't exist
                "suggested_action": "verify_customer_id"  # Hint for agent reasoning
            }
        return {"success": True, "customer": customer}
    except database.ConnectionError:
        return {
            "success": False,
            "error_code": "SERVICE_UNAVAILABLE",
            "error_message": "Database connection failed. Transient error.",
            "recoverable": True,            # Retry is appropriate
            "retry_after_seconds": 5
        }
    except PermissionError:
        return {
            "success": False,
            "error_code": "PERMISSION_DENIED",
            "error_message": "Caller lacks access to customer records.",
            "recoverable": False,           # Retrying won't help
            "suggested_action": "escalate_to_human"
        }

Notice how each error response includes not just a code but metadata that directly guides agent behavior: recoverable prevents futile retry loops, retry_after_seconds prevents hammering a degraded service, and suggested_action provides a vocabulary the agent's reasoning can act on.

⚠️ Common Mistake: Mistake 3 — Catching all exceptions in a single except Exception block and returning a generic error. This collapses multiple distinct failure modes into a single ambiguous signal, making it impossible for the agent to respond appropriately.

🧠 Mnemonic: Think RASH for error response quality: Recoverable flag, Action suggestion, Status code, Human-readable message. Every tool error response should include all four.


Pitfall 5: Versioning Pitfalls in Long-Running Deployments

Agentic systems are rarely deployed once and left alone. Tools get updated, protocols evolve, and the underlying APIs your tools wrap release new versions. In a traditional web application, you deploy new code and users immediately get the new behavior. In a long-running agentic system — one where agents might be executing multi-day workflows or where many agent instances run concurrently — version drift creates insidious bugs.

Version drift occurs when an agent was initialized with one version of a tool's schema or behavior, but the tool has since changed. The agent's cached context, system prompt, or in-flight state no longer matches reality. The result ranges from subtle data misinterpretation to outright failures.

Day 1: Agent initialized with Tool v1.2
       search_orders returns {"items": [...]}
              │
              ▼
Day 3: Tool upgraded to v1.3
       search_orders now returns {"orders": [...]}   ← field name changed!
              │
              ▼
Day 5: Same agent instance calls search_orders
       Expects "items" key — gets "orders" key
       Agent silently reads empty list, concludes no orders exist
       Takes incorrect compensating action

There are three concrete practices that prevent version drift from becoming a silent production bug:

1. Include version metadata in every tool response. This gives any observability tooling — and the agent itself, if instructed to check — a way to detect mismatches:

def search_orders_v2(status: str = None) -> dict:
    results = database.query_orders(status=status)
    return {
        "tool_name": "search_orders",
        "tool_version": "2.0.0",        # Explicit version in every response
        "schema_version": "2024-06",    # Schema version separately from logic version
        "orders": results,              # Renamed from 'items' in v1
        "total_count": len(results)
    }

2. Maintain backward-compatible aliases during transition windows. If you rename a field, return both the old and new field names for at least one version cycle:

## During transition: return both old and new field names
return {
    "orders": results,      # New canonical name
    "items": results,       # Deprecated alias — remove in v3
    "_deprecation_notice": "'items' key deprecated; use 'orders'. Removed in v3.0.0"
}

3. Invalidate or reinitialize long-running agents on breaking changes. Define what constitutes a breaking change in your tool contracts (renamed required fields, removed response keys, changed enum values), and implement a mechanism to signal running agents that their context is stale.

📋 Quick Reference Card: Tool Change Categories

🔒 Change Type 📋 Breaking? 🔧 Migration Strategy
🟢 Adding optional response field No Safe to deploy immediately
🟡 Renaming a response field Yes Dual-return alias for one version cycle
🔴 Removing a required input param No (additive) Mark deprecated, remove after cycle
🔴 Adding a required input param Yes Provide default or version-gate
🔴 Changing enum values Yes Reinitialize affected agent sessions
🟡 Changing error code strings Sometimes Treat as breaking if agents branch on them

⚠️ Common Mistake: Mistake 4 — Treating tool versioning as a deployment concern rather than an agent contract concern. Updating a tool's behavior without updating its declared schema version means any agent relying on that tool has no way to detect that its assumptions are now invalid.

💡 Pro Tip: Adopt semantic versioning for tool schemas the same way you would for a public API. Patch versions (1.0.1) for bug fixes that don't change behavior, minor versions (1.1.0) for backward-compatible additions, and major versions (2.0.0) for any breaking changes. Make the version part of the tool's registered name in your protocol if your framework supports it: search_orders_v2 versus search_orders_v1.


Bringing It Together: A Pitfall-Resistant Tool Design Checklist

These five pitfalls are interconnected. An overly broad tool that lacks schema validation and returns unstructured errors, deployed without version tracking, is a compounding failure waiting to happen. Conversely, getting each of these right creates a mutually reinforcing system where agents can trust their tools, tools can trust their inputs, and operators can trust the whole pipeline.

🧠 Mental Model: Think of your tools as the immune system of your agentic pipeline. A single weak tool is an entry point for cascading failures across every agent that depends on it. Defense in depth — scoped definitions, validated schemas, structured errors, loose coupling, and versioned contracts — is what keeps the system healthy under real-world conditions.

Before shipping any tool into a production agentic system, run it against this checklist:

  • 🎯 Scope: Does this tool do exactly one thing? Is its name and description unambiguous about what that thing is?
  • 🔒 Validation: Does every input get validated against a declared schema before reaching business logic? Does every validation failure return a structured, recoverable error?
  • 🔧 Coupling: Does any agent prompt or orchestration logic reference implementation details not in the official tool schema?
  • 📚 Error handling: Does every failure mode return a distinct error code, a recoverable flag, and an actionable suggestion?
  • 🔄 Versioning: Is the tool's schema version declared? Is there a plan for communicating breaking changes to running agent sessions?

Tools that pass this checklist aren't just safer — they're dramatically easier to debug, extend, and hand off to teammates. The discipline you apply to your tool contracts today is what determines whether your agentic systems remain manageable as they scale.

Key Takeaways & Preparing for MCP, A2A, Tool Design & Skill Loading

You've traveled a significant distance in this lesson. What started as abstract ideas — protocols, tools, skills — should now feel like concrete architectural building blocks you can reason about, design around, and debate with your team. This final section crystallizes everything you've learned, gives you a practical reference to carry into your next project, and points you toward the specialized child lessons where each concept gets its own deep treatment.

Let's close the loop properly.


Revisiting the Three-Layer Model

The single most important mental model from this lesson is the three-layer architecture that underpins every well-designed agentic system. Each layer has a distinct responsibility, and conflating them is the root cause of most of the pitfalls covered in the previous section.

┌─────────────────────────────────────────────────────┐
│               SKILL LAYER                           │
│  Reusable, composable expertise bundles             │
│  (ResearchSkill, SummarizationSkill, CodingSkill)   │
├─────────────────────────────────────────────────────┤
│               TOOL LAYER                            │
│  Discrete, stateless actions with typed I/O         │
│  (search_web, write_file, query_database)           │
├─────────────────────────────────────────────────────┤
│              PROTOCOL LAYER                         │
│  Message schemas, contracts, routing rules          │
│  (MCP, A2A, custom JSON-RPC schemas)                │
└─────────────────────────────────────────────────────┘

🧠 Mental Model: Think of this like a modern web application. The protocol layer is your HTTP and REST conventions — the how of communication. The tool layer is your API endpoints — the what you can do. The skill layer is your business logic services — the why and when something gets done.

Each layer communicates downward but never reaches across. A skill calls tools; a tool sends protocol messages; a protocol message carries tool invocations. Nothing jumps layers.

🎯 Key Principle: The three layers exist to give each concern its own boundary. Protocols change when communication standards evolve. Tools change when capabilities expand. Skills change when domain knowledge grows. If you mix them, every change cascades unpredictably.


The Principle of Separation of Concerns in Practice

Separation of concerns isn't just a software engineering cliché here — it's a survival strategy for agentic systems that will inevitably grow, be handed to other developers, and need to swap out components.

Consider what happens when you violate it. Imagine a skill that hardcodes a specific protocol message format to invoke a tool:

## ❌ Wrong: Skill is tightly coupled to protocol format
class ResearchSkill:
    def run(self, query: str) -> str:
        # Skill is manually constructing a raw protocol message
        raw_message = {
            "jsonrpc": "2.0",
            "method": "tools/call",
            "params": {
                "name": "search_web",
                "arguments": {"query": query, "max_results": 5}
            },
            "id": "req-001"
        }
        response = self.transport.send(raw_message)
        return response["result"]["content"][0]["text"]

This skill now knows too much. It knows the JSON-RPC version, the method name convention, the response shape, and the transport mechanism. If any of those change — say you migrate to MCP or swap transports — you have to rewrite the skill.

Here's the separated version:

## ✅ Correct: Each layer owns its concerns
from typing import Any

## PROTOCOL LAYER: Owns message construction and transport
class ToolProtocolClient:
    def call_tool(self, tool_name: str, arguments: dict[str, Any]) -> str:
        """Constructs and sends a protocol-compliant tool call."""
        message = {
            "jsonrpc": "2.0",
            "method": "tools/call",
            "params": {"name": tool_name, "arguments": arguments},
            "id": self._next_id()
        }
        response = self.transport.send(message)
        # Protocol layer also owns response parsing
        return response["result"]["content"][0]["text"]

## TOOL LAYER: Owns tool definition and validation
class WebSearchTool:
    name = "search_web"
    description = "Search the web for current information on a topic."
    
    def invoke(self, client: ToolProtocolClient, query: str, max_results: int = 5) -> str:
        """Validates inputs and delegates to the protocol client."""
        if not query.strip():
            raise ValueError("Query cannot be empty")
        return client.call_tool(self.name, {"query": query, "max_results": max_results})

## SKILL LAYER: Owns orchestration logic and domain expertise
class ResearchSkill:
    def __init__(self, search_tool: WebSearchTool, protocol_client: ToolProtocolClient):
        self.search_tool = search_tool
        self.client = protocol_client
    
    def run(self, query: str) -> str:
        """Orchestrates tools without knowing anything about protocol internals."""
        results = self.search_tool.invoke(self.client, query)
        # Skill applies domain judgment: summarize, filter, re-query if needed
        return self._synthesize(results, query)
    
    def _synthesize(self, raw_results: str, original_query: str) -> str:
        # Domain logic lives here, not protocol logic
        return f"Research findings for '{original_query}':\n{raw_results}"

Now the skill has no idea what protocol format is being used. You can swap the ToolProtocolClient for an MCP-native client, a mock for testing, or a future A2A-aware client — and ResearchSkill never changes.

💡 Pro Tip: When reviewing your own agentic code, ask: "If I changed the transport protocol tomorrow, how many files would I touch?" More than one or two and your layers are leaking.


Quick-Reference Checklist: Tool Definitions and Protocol Messages

Before you ship any tool or protocol integration, use this checklist. It distills the standards discussed throughout this lesson into a fast verification routine.

📋 Quick Reference Card: Well-Designed Tool Definition

Requirement Why It Matters
Unique, descriptive name (verb_noun pattern) Prevents collision; LLMs use names for selection
Plain-English description (what + when to use) Guides model tool selection at inference time
Typed input schema (JSON Schema or Pydantic) Enables validation before execution
Typed output schema Downstream skills can parse outputs reliably
Error contract (what errors are returned, when) Callers can handle failures gracefully
Idempotency declaration (safe to retry?) Critical for fault-tolerant pipelines
No side-effect surprises (docs any mutations) Agents need to know what is destructive

📋 Quick Reference Card: Well-Formed Protocol Message

Requirement Why It Matters
Schema version field Enables backward compatibility
Unique correlation ID Links requests to responses in async flows
Explicit method/action field Router knows what to invoke
Validated payload (not raw strings) Prevents injection and parse failures
Timeout or TTL metadata Prevents indefinite hangs
Authentication context (where applicable) Enforces least-privilege at message level
Error envelope format Consistent failure reporting across services

⚠️ Critical Point: The most commonly skipped items are the error contract and the idempotency declaration. Developers define happy-path tool schemas perfectly and then leave failure modes as implicit tribal knowledge. An agent operating at 3 AM with no human in the loop will encounter failures. Document them.


What You Now Understand That You Didn't Before

Let's be explicit about the conceptual shift this lesson has produced. The table below contrasts common pre-lesson mental models with the more precise understanding you should now hold.

Before This Lesson After This Lesson
🔴 "Tools are just functions I give the LLM" 🟢 Tools are typed, validated, documented contracts with error envelopes and idempotency declarations
🔴 "Protocols are just API formats" 🟢 Protocols are the communication contracts that govern how agents, tools, and systems discover and invoke each other reliably
🔴 "Skills are just prompts or chains" 🟢 Skills are reusable, composable expertise bundles that orchestrate tools without owning protocol details
🔴 "I'll add structure when the project gets bigger" 🟢 Layer separation is cheapest at the start; retrofitting it into a tangled codebase is a significant rewrite
🔴 "Agent failures are usually model failures" 🟢 Most production agent failures are protocol mismatches, missing error handling, or tool schema ambiguity

🤔 Did you know? Studies of production LLM agent deployments consistently show that tool definition quality — specifically description clarity and schema precision — has more impact on task success rates than model size. A well-described tool on a smaller model often outperforms a poorly described tool on a larger one.


Teaser: MCP — Standardizing Tool and Context Communication

The Model Context Protocol (MCP) is the child lesson that picks up exactly where this lesson leaves off on the protocol side. MCP provides a standardized, open specification for how AI models communicate with tools, data sources, and context providers. Everything you've learned about protocol message structure, tool schemas, and context passing maps directly onto MCP's architecture.

In the MCP lesson, you'll encounter:

  • 🔧 Tool registration and discovery — how a host exposes tools to a model using MCP's standard tools/list and tools/call methods
  • 📚 Resource handling — how MCP structures access to files, databases, and APIs as typed resources with URIs
  • 🎯 Prompt templates — how MCP formalizes reusable prompt patterns as first-class protocol objects
  • 🔒 Capability negotiation — how clients and servers declare what they support during handshake, preventing silent feature mismatches

💡 Mental Model for MCP: If this lesson taught you what a well-designed protocol message should look like, the MCP lesson teaches you the specific vocabulary and structure of the most widely adopted protocol for model-tool communication. You're learning the grammar; MCP is learning a specific language that uses that grammar.

The tool checklist you just reviewed? Every item on it maps to a specific field or requirement in the MCP specification. You're not starting from scratch in that lesson — you're naming what you already understand.


Teaser: A2A — Governing Agent-to-Agent Interaction

The Agent-to-Agent (A2A) protocol lesson addresses a different but equally important frontier: what happens when the entity on the other end of a protocol message isn't a tool or a database, but another agent.

This introduces challenges that single-agent tool use never faces:

  • 🧠 Capability advertisement — how agents declare what tasks they can accept, under what constraints
  • 📚 Task delegation and tracking — how a calling agent monitors the progress of a sub-agent working on a long-running task
  • 🎯 Trust and authorization — when agent A delegates to agent B, what permissions transfer, and how that's verified
  • 🔧 Failure propagation — when a sub-agent fails mid-task, how the parent agent receives structured failure information it can reason about

Everything in this lesson about protocol message schemas, error envelopes, and separation of concerns applies directly to A2A. The conceptual leap in that child lesson is recognizing that the "tool" on the other end has its own goals, its own memory, and its own failure modes — and your protocol must account for that.

⚠️ Critical Point: The most common mistake developers make when first building multi-agent systems is treating agent-to-agent calls like tool calls. They're not. A tool is deterministic and stateless. An agent is stateful, may ask clarifying questions, may partially complete work, and may spawn its own sub-agents. The A2A lesson gives you the protocol primitives to handle all of this correctly.


Teaser: Tool Design & Skill Loading

The two remaining child lessons zoom into the tool and skill layers respectively.

Tool Design goes far beyond schema definitions. You'll learn:

  • How to design tools that are genuinely useful to LLMs at inference time (not just technically correct)
  • Versioning strategies for tools that must evolve without breaking existing agents
  • Testing patterns for tools in isolation before they're ever touched by a live model
  • Composability patterns: when to build one flexible tool vs. several focused tools

Skill Loading addresses a challenge that only emerges at scale: agents that need access to hundreds of potential skills can't hold all of them in context simultaneously. Progressive skill loading — loading skills on demand based on task requirements — is a critical performance and cost optimization. You'll learn:

  • How to design skill registries and lazy-loading mechanisms
  • When to pre-load vs. dynamically discover skills
  • How skill metadata (tags, capability declarations) enables intelligent routing
## Preview: What progressive skill loading looks like in practice
class SkillRegistry:
    """Loads skills on demand rather than all at startup."""
    
    def __init__(self):
        self._registry: dict[str, type] = {}
        self._loaded: dict[str, object] = {}
    
    def register(self, name: str, skill_class: type, tags: list[str]) -> None:
        """Register a skill by name without instantiating it yet."""
        self._registry[name] = {"class": skill_class, "tags": tags}
    
    def get(self, name: str) -> object:
        """Lazily instantiate and cache a skill only when first needed."""
        if name not in self._loaded:
            if name not in self._registry:
                raise KeyError(f"No skill registered under '{name}'")
            # Skill is instantiated here, not at startup
            self._loaded[name] = self._registry[name]["class"]()
        return self._loaded[name]
    
    def find_by_tag(self, tag: str) -> list[str]:
        """Discover relevant skills dynamically without loading them."""
        return [
            name for name, meta in self._registry.items()
            if tag in meta["tags"]
        ]

## Usage: Agent only loads what it needs for the current task
registry = SkillRegistry()
registry.register("research", ResearchSkill, tags=["information", "web", "search"])
registry.register("coding", CodingSkill, tags=["code", "programming", "debug"])

## For a research task, only the research skill gets instantiated
relevant = registry.find_by_tag("search")  # ["research"]
active_skill = registry.get("research")     # Instantiated here, on demand

This snippet previews the pattern you'll implement fully in the Skill Loading child lesson. Notice that CodingSkill is never instantiated during a research task — its class definition is registered but its object never constructed until needed.


Here's how to move from this lesson into the child lessons with maximum momentum:

Step 1 — Audit an existing project (or design a new one) Before jumping into any child lesson, take 30 minutes to map any agentic code you currently own (or plan to write) against the three-layer model. Where are the boundaries blurry? Where is protocol logic leaking into skills? That audit will make the child lessons feel immediately applicable rather than theoretical.

Step 2 — Read the MCP specification overview first The MCP lesson is the natural next stop after this one. The protocol foundation you've built here makes the MCP spec readable in a way it wouldn't have been before. When you see tools/call in the spec, you'll recognize the protocol layer. When you see resource definitions, you'll recognize the tool schema pattern.

Step 3 — Design a tool before the Tool Design lesson A powerful learning technique: attempt to design one real tool definition using only the checklist from this lesson. What gaps do you hit? What questions surface? Bring those into the Tool Design lesson. You'll learn more from having concrete questions than from arriving with a blank slate.

🎯 Key Principle: The architecture decisions you make in the child lessons — which protocol to adopt, how to structure tool schemas, when to bundle capabilities into skills vs. expose them as discrete tools — all flow from the conceptual separation you've internalized in this lesson. The child lessons are specializations of a pattern you already understand.


Final Summary

🧠 The three-layer model (protocol → tool → skill) is your primary architectural compass. Every design decision in agentic systems can be evaluated against it.

📚 Separation of concerns is not optional at scale. Layers that leak into each other create fragile systems that are expensive to change and hard to test.

🔧 Tool definitions are contracts, not just function signatures. They must include descriptions that guide model selection, typed schemas, error contracts, and idempotency declarations.

🎯 Protocol messages must include correlation IDs, schema versions, validated payloads, and error envelopes. These aren't bureaucratic overhead — they're the infrastructure that makes failure recoverable.

🔒 MCP standardizes how tools and context are communicated between models and external systems. A2A governs how agents communicate with each other. Both build on the protocol fundamentals in this lesson.

⚠️ The most important thing to carry forward: The mistakes that break production agentic systems almost never happen at the model level. They happen at the protocol boundary — missing error envelopes, ambiguous tool descriptions, skills that reach across layers. The discipline you've built in this lesson is what prevents those failures.

You are now ready for MCP, A2A, Tool Design, and Skill Loading. The foundations are solid. Build on them.