akashnotes — Structured Learning for Engineers

The LLM has decided to call your function. Now what? This guide covers the complete flow: parsing the tool call, executing your Python function, and returning the result back to the LLM.

Coming from Software Engineering? The tool execution loop (LLM requests → parse → execute → return) is just an event-driven message bus pattern. If you've built command handlers, message queues, or event sourcing systems, you'll recognize this: receive a message (tool call), dispatch to handler (function), return result.

The Tool Calling Flow

Step 1: Detect Tool Calls

The model can't run code — when it needs live data it returns a structured request (like an RPC call your service must fulfill): run get_weather with city="Tokyo". Your app runs the real function and hands the result back.

When the LLM wants to call a function, it returns a special response:

# script_id: day_030_tool_execution_handling_part1/tool_call_flow
from openai import OpenAI, pydantic_function_tool
from pydantic import BaseModel, Field
from typing import Any
import json

client = OpenAI()

# Define tool schema as a Pydantic model
class GetWeather(BaseModel):
    """Get current weather for a city."""
    city: str = Field(description="City name")
    unit: str = Field(description="Temperature unit: celsius or fahrenheit")

tools = [pydantic_function_tool(GetWeather)]

# pydantic_function_tool names the tool after the class, so the model
# calls it "GetWeather" — that's the key you dispatch on.
# (You can also constrain a field to fixed choices with an enum — shown later.)

# Make the API call
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

message = response.choices[0].message

# Check if LLM wants to call a tool
if message.tool_calls:
    print("LLM wants to call tools!")
    for tool_call in message.tool_calls:
        print(f"  Function: {tool_call.function.name}")
        print(f"  Arguments: {tool_call.function.arguments}")
else:
    print("No tool calls, regular response:", message.content)

Step 2: Parse the Tool Call

Extract the function name and arguments:

# script_id: day_030_tool_execution_handling_part1/tool_call_flow
def parse_tool_call(tool_call) -> dict:
    """
    Parse a tool call from the LLM response.

    Returns:
        dict with 'id', 'name', and 'arguments'
    """
    return {
        "id": tool_call.id,
        "name": tool_call.function.name,
        "arguments": json.loads(tool_call.function.arguments)
    }

# Usage
if message.tool_calls:
    for tool_call in message.tool_calls:
        parsed = parse_tool_call(tool_call)
        print(f"Call ID: {parsed['id']}")
        print(f"Function: {parsed['name']}")
        print(f"Arguments: {parsed['arguments']}")

Step 3: Execute the Function

Map function names to actual Python functions and execute:

# script_id: day_030_tool_execution_handling_part1/tool_call_flow
# Define your actual functions
def get_weather(city: str, unit: str = "celsius") -> dict:
    """Get weather for a city (mock implementation)."""
    # In real code, call a weather API
    weather_data = {
        "Tokyo": {"temp": 22, "condition": "sunny"},
        "London": {"temp": 15, "condition": "cloudy"},
        "New York": {"temp": 18, "condition": "rainy"}
    }

    data = weather_data.get(city, {"temp": 20, "condition": "unknown"})

    if unit == "fahrenheit":
        data["temp"] = data["temp"] * 9/5 + 32

    data["unit"] = unit
    data["city"] = city
    return data

# Function registry — keyed on the schema CLASS name the model sends
FUNCTIONS = {
    "GetWeather": get_weather,
}

def execute_function(name: str, arguments: dict) -> Any:
    """Execute a function by name with given arguments."""

    if name not in FUNCTIONS:
        raise ValueError(f"Unknown function: {name}")

    func = FUNCTIONS[name]
    return func(**arguments)

# Usage
result = execute_function("GetWeather", {"city": "Tokyo", "unit": "celsius"})
print(result)  # {"temp": 22, "condition": "sunny", "unit": "celsius", "city": "Tokyo"}

Step 4: Return Results to the LLM

Send the function result back so the LLM can formulate a response:

# script_id: day_030_tool_execution_handling_part1/tool_call_flow
def complete_tool_call(client, messages: list, tools: list) -> str:
    """
    Complete the full tool calling cycle.

    Returns the final text response from the LLM.
    """

    # Step 1: Initial request
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )

    message = response.choices[0].message

    # Step 2: Check for tool calls
    if not message.tool_calls:
        return message.content

    # Step 3: Add assistant message to history
    # append the model's message object as-is — it carries the tool_calls the API needs to match results to
    messages.append(message)

    # Step 4: Execute each tool call and add results
    for tool_call in message.tool_calls:
        parsed = parse_tool_call(tool_call)

        try:
            # Execute the function
            result = execute_function(parsed["name"], parsed["arguments"])
            result_str = json.dumps(result)
        except Exception as e:
            result_str = json.dumps({"error": str(e)})

        # Add tool result to messages
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": result_str
        })

    # Step 5: Get final response from LLM
    final_response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )

    return final_response.choices[0].message.content

# Usage
messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]
answer = complete_tool_call(client, messages, tools)
print(answer)  # "The current weather in Tokyo is 22°C and sunny!"

Complete Example: Multi-Tool Agent

Same parse → dispatch → return loop as Steps 1-4, now wrapped in a class that loops until the model stops asking for tools. The schema classes get a Schema suffix here only to keep them distinct from the impl functions; the dispatch key is still the class name.

# script_id: day_030_tool_execution_handling_part1/multi_tool_agent
from openai import OpenAI, pydantic_function_tool
from pydantic import BaseModel, Field
import json
from datetime import datetime

client = OpenAI()

# Define tool schemas as Pydantic models
class GetWeatherSchema(BaseModel):
    """Get current weather for a city."""
    city: str = Field(description="City name")

class GetTimeSchema(BaseModel):
    """Get current time in a timezone."""
    timezone: str = Field(description="Timezone name")

class CalculateSchema(BaseModel):
    """Evaluate a mathematical expression."""
    expression: str = Field(description="Math expression")

# Function implementations
def get_weather(city: str) -> dict:
    """Get weather for a city."""
    return {"city": city, "temp": 22, "condition": "sunny"}

def get_time(timezone: str = "UTC") -> dict:
    """Get current time in a timezone."""
    return {"timezone": timezone, "time": datetime.now().isoformat()}

def calculate(expression: str) -> dict:
    """Evaluate a math expression safely (no eval!)."""
    import ast, operator
    try:
        def safe_eval(node):
            if isinstance(node, ast.Constant): return node.value
            elif isinstance(node, ast.BinOp):
                ops = {ast.Add: operator.add, ast.Sub: operator.sub,
                       ast.Mult: operator.mul, ast.Div: operator.truediv}
                return ops[type(node.op)](safe_eval(node.left), safe_eval(node.right))
            elif isinstance(node, ast.UnaryOp) and isinstance(node.op, ast.USub):
                return -safe_eval(node.operand)
            raise ValueError("Unsupported expression")
        result = safe_eval(ast.parse(expression, mode='eval').body)
        return {"expression": expression, "result": result}
    except Exception as e:
        return {"error": str(e)}

# Map schema class names to function implementations
TOOLS = {
    "GetWeatherSchema": get_weather,
    "GetTimeSchema": get_time,
    "CalculateSchema": calculate,
}

# Generate tool schemas from Pydantic models
TOOL_SCHEMAS = [
    pydantic_function_tool(GetWeatherSchema),
    pydantic_function_tool(GetTimeSchema),
    pydantic_function_tool(CalculateSchema),
]

class ToolAgent:
    """Agent that can use multiple tools."""

    def __init__(self):
        self.client = OpenAI()
        self.messages = []

    def chat(self, user_message: str) -> str:
        """Process a user message, potentially using tools."""

        self.messages.append({"role": "user", "content": user_message})

        # Keep processing until we get a final response
        while True:
            response = self.client.chat.completions.create(
                model="gpt-4o",
                messages=self.messages,
                tools=TOOL_SCHEMAS
            )

            message = response.choices[0].message

            # No tool calls - we have our answer
            if not message.tool_calls:
                self.messages.append(message)
                return message.content

            # Process tool calls
            self.messages.append(message)

            for tool_call in message.tool_calls:
                name = tool_call.function.name
                args = json.loads(tool_call.function.arguments)

                print(f"  [Calling {name}({args})]")

                # Execute tool
                if name in TOOLS:
                    result = TOOLS[name](**args)
                else:
                    result = {"error": f"Unknown tool: {name}"}

                # Add result to messages
                self.messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps(result)
                })

# Usage
agent = ToolAgent()

print(agent.chat("What's the weather in Paris?"))
print()
print(agent.chat("What's 15% of 230?"))
print()
print(agent.chat("What time is it in Tokyo timezone?"))

Checkpoint

Run complete_tool_call (or the multi_tool_agent) on a weather question and confirm: the parsed tool call's arguments come back as a real dict, execute_function dispatches to the right Python function, and the result is appended to the message list with the matching tool_call_id before the model speaks again. If the second model turn errors, check that every tool call you received got a corresponding tool-result message back — providers reject a follow-up that leaves a tool call unanswered.

Summary

Quick Reference

Step	Code	Notes
Detect calls	`message.tool_calls`	`None`/empty means the model answered in text
Parse arguments	`json.loads(tool_call.function.arguments)`	Arguments arrive as a JSON string
Dispatch	`execute_function(tool_call.function.name, args)`	Map name → your Python function
Return result	`{"role": "tool", "tool_call_id": tc.id, "content": json.dumps(result)}`	The `tool_call_id` must match
Continue	re-call `client.chat.completions.create(model="gpt-4o", messages=...)`	Append the tool message first
Finish	model returns text with no `tool_calls`	That's your final answer

Exercises

Add a tool. Register a convert_currency(amount, from_currency, to_currency) tool and confirm the agent calls it for a relevant question.
Unknown-tool handling. Ask for something no tool covers and verify the agent degrades gracefully instead of crashing.
Inspect the loop. Log each tool_call the model requests and the result you return — trace one full request end to end.
Bad arguments. Make the model call a tool with a missing or invalid argument and have your executor return a structured error the model can recover from.

Solutions (approaches)

Add the function plus its Pydantic/tool spec, then add a branch in execute_function. Ask "Convert 100 USD to EUR" and check the dispatched name.
In execute_function, return {"error": f"unknown tool: {name}"} for unrecognized names instead of raising; the model can then apologize gracefully.
Add print()/logging inside the loop for parsed["name"], parsed["arguments"], and the returned result — one full round shows request → execute → result → final text.
Wrap the call in try/except and return {"error": str(e)} as the tool content; the model reads the error and can retry or ask for the missing field.

What's Next?

You can now run a single round of tool calls. Next up: Tool Execution Handling Part 2 — parallel tool calls, timeouts, error categorization, and smart retries for production.

Parsing Tool Calls, Executing Functions & Returning Results