The LLM has decided to call your function. Now what? This guide covers the complete flow: parsing the tool call, executing your Python function, and returning the result back to the LLM.
Coming from Software Engineering? The tool execution loop (LLM requests → parse → execute → return) is just an event-driven message bus pattern. If you've built command handlers, message queues, or event sourcing systems, you'll recognize this: receive a message (tool call), dispatch to handler (function), return result.
The Tool Calling Flow
Step 1: Detect Tool Calls
The model can't run code — when it needs live data it returns a structured request (like an RPC call your service must fulfill): run get_weather with city="Tokyo". Your app runs the real function and hands the result back.
When the LLM wants to call a function, it returns a special response:
# script_id: day_030_tool_execution_handling_part1/tool_call_flow
from openai import OpenAI, pydantic_function_tool
from pydantic import BaseModel, Field
from typing import Any
import json
client = OpenAI()
# Define tool schema as a Pydantic model
class GetWeather(BaseModel):
"""Get current weather for a city."""
city: str = Field(description="City name")
unit: str = Field(description="Temperature unit: celsius or fahrenheit")
tools = [pydantic_function_tool(GetWeather)]
# pydantic_function_tool names the tool after the class, so the model
# calls it "GetWeather" — that's the key you dispatch on.
# (You can also constrain a field to fixed choices with an enum — shown later.)
# Make the API call
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools
)
message = response.choices[0].message
# Check if LLM wants to call a tool
if message.tool_calls:
print("LLM wants to call tools!")
for tool_call in message.tool_calls:
print(f" Function: {tool_call.function.name}")
print(f" Arguments: {tool_call.function.arguments}")
else:
print("No tool calls, regular response:", message.content)
Step 2: Parse the Tool Call
Extract the function name and arguments:
# script_id: day_030_tool_execution_handling_part1/tool_call_flow
def parse_tool_call(tool_call) -> dict:
"""
Parse a tool call from the LLM response.
Returns:
dict with 'id', 'name', and 'arguments'
"""
return {
"id": tool_call.id,
"name": tool_call.function.name,
"arguments": json.loads(tool_call.function.arguments)
}
# Usage
if message.tool_calls:
for tool_call in message.tool_calls:
parsed = parse_tool_call(tool_call)
print(f"Call ID: {parsed['id']}")
print(f"Function: {parsed['name']}")
print(f"Arguments: {parsed['arguments']}")
Step 3: Execute the Function
Map function names to actual Python functions and execute:
# script_id: day_030_tool_execution_handling_part1/tool_call_flow
# Define your actual functions
def get_weather(city: str, unit: str = "celsius") -> dict:
"""Get weather for a city (mock implementation)."""
# In real code, call a weather API
weather_data = {
"Tokyo": {"temp": 22, "condition": "sunny"},
"London": {"temp": 15, "condition": "cloudy"},
"New York": {"temp": 18, "condition": "rainy"}
}
data = weather_data.get(city, {"temp": 20, "condition": "unknown"})
if unit == "fahrenheit":
data["temp"] = data["temp"] * 9/5 + 32
data["unit"] = unit
data["city"] = city
return data
# Function registry — keyed on the schema CLASS name the model sends
FUNCTIONS = {
"GetWeather": get_weather,
}
def execute_function(name: str, arguments: dict) -> Any:
"""Execute a function by name with given arguments."""
if name not in FUNCTIONS:
raise ValueError(f"Unknown function: {name}")
func = FUNCTIONS[name]
return func(**arguments)
# Usage
result = execute_function("GetWeather", {"city": "Tokyo", "unit": "celsius"})
print(result) # {"temp": 22, "condition": "sunny", "unit": "celsius", "city": "Tokyo"}
Step 4: Return Results to the LLM
Send the function result back so the LLM can formulate a response:
# script_id: day_030_tool_execution_handling_part1/tool_call_flow
def complete_tool_call(client, messages: list, tools: list) -> str:
"""
Complete the full tool calling cycle.
Returns the final text response from the LLM.
"""
# Step 1: Initial request
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools
)
message = response.choices[0].message
# Step 2: Check for tool calls
if not message.tool_calls:
return message.content
# Step 3: Add assistant message to history
# append the model's message object as-is — it carries the tool_calls the API needs to match results to
messages.append(message)
# Step 4: Execute each tool call and add results
for tool_call in message.tool_calls:
parsed = parse_tool_call(tool_call)
try:
# Execute the function
result = execute_function(parsed["name"], parsed["arguments"])
result_str = json.dumps(result)
except Exception as e:
result_str = json.dumps({"error": str(e)})
# Add tool result to messages
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result_str
})
# Step 5: Get final response from LLM
final_response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools
)
return final_response.choices[0].message.content
# Usage
messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]
answer = complete_tool_call(client, messages, tools)
print(answer) # "The current weather in Tokyo is 22°C and sunny!"
Complete Example: Multi-Tool Agent
Same parse → dispatch → return loop as Steps 1-4, now wrapped in a class that loops until the model stops asking for tools. The schema classes get a Schema suffix here only to keep them distinct from the impl functions; the dispatch key is still the class name.
# script_id: day_030_tool_execution_handling_part1/multi_tool_agent
from openai import OpenAI, pydantic_function_tool
from pydantic import BaseModel, Field
import json
from datetime import datetime
client = OpenAI()
# Define tool schemas as Pydantic models
class GetWeatherSchema(BaseModel):
"""Get current weather for a city."""
city: str = Field(description="City name")
class GetTimeSchema(BaseModel):
"""Get current time in a timezone."""
timezone: str = Field(description="Timezone name")
class CalculateSchema(BaseModel):
"""Evaluate a mathematical expression."""
expression: str = Field(description="Math expression")
# Function implementations
def get_weather(city: str) -> dict:
"""Get weather for a city."""
return {"city": city, "temp": 22, "condition": "sunny"}
def get_time(timezone: str = "UTC") -> dict:
"""Get current time in a timezone."""
return {"timezone": timezone, "time": datetime.now().isoformat()}
def calculate(expression: str) -> dict:
"""Evaluate a math expression safely (no eval!)."""
import ast, operator
try:
def safe_eval(node):
if isinstance(node, ast.Constant): return node.value
elif isinstance(node, ast.BinOp):
ops = {ast.Add: operator.add, ast.Sub: operator.sub,
ast.Mult: operator.mul, ast.Div: operator.truediv}
return ops[type(node.op)](safe_eval(node.left), safe_eval(node.right))
elif isinstance(node, ast.UnaryOp) and isinstance(node.op, ast.USub):
return -safe_eval(node.operand)
raise ValueError("Unsupported expression")
result = safe_eval(ast.parse(expression, mode='eval').body)
return {"expression": expression, "result": result}
except Exception as e:
return {"error": str(e)}
# Map schema class names to function implementations
TOOLS = {
"GetWeatherSchema": get_weather,
"GetTimeSchema": get_time,
"CalculateSchema": calculate,
}
# Generate tool schemas from Pydantic models
TOOL_SCHEMAS = [
pydantic_function_tool(GetWeatherSchema),
pydantic_function_tool(GetTimeSchema),
pydantic_function_tool(CalculateSchema),
]
class ToolAgent:
"""Agent that can use multiple tools."""
def __init__(self):
self.client = OpenAI()
self.messages = []
def chat(self, user_message: str) -> str:
"""Process a user message, potentially using tools."""
self.messages.append({"role": "user", "content": user_message})
# Keep processing until we get a final response
while True:
response = self.client.chat.completions.create(
model="gpt-4o",
messages=self.messages,
tools=TOOL_SCHEMAS
)
message = response.choices[0].message
# No tool calls - we have our answer
if not message.tool_calls:
self.messages.append(message)
return message.content
# Process tool calls
self.messages.append(message)
for tool_call in message.tool_calls:
name = tool_call.function.name
args = json.loads(tool_call.function.arguments)
print(f" [Calling {name}({args})]")
# Execute tool
if name in TOOLS:
result = TOOLS[name](**args)
else:
result = {"error": f"Unknown tool: {name}"}
# Add result to messages
self.messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})
# Usage
agent = ToolAgent()
print(agent.chat("What's the weather in Paris?"))
print()
print(agent.chat("What's 15% of 230?"))
print()
print(agent.chat("What time is it in Tokyo timezone?"))
Checkpoint
Run complete_tool_call (or the multi_tool_agent) on a weather question and confirm: the parsed tool call's arguments come back as a real dict, execute_function dispatches to the right Python function, and the result is appended to the message list with the matching tool_call_id before the model speaks again. If the second model turn errors, check that every tool call you received got a corresponding tool-result message back — providers reject a follow-up that leaves a tool call unanswered.
Summary
Quick Reference
| Step | Code | Notes |
|---|---|---|
| Detect calls | message.tool_calls |
None/empty means the model answered in text |
| Parse arguments | json.loads(tool_call.function.arguments) |
Arguments arrive as a JSON string |
| Dispatch | execute_function(tool_call.function.name, args) |
Map name → your Python function |
| Return result | {"role": "tool", "tool_call_id": tc.id, "content": json.dumps(result)} |
The tool_call_id must match |
| Continue | re-call client.chat.completions.create(model="gpt-4o", messages=...) |
Append the tool message first |
| Finish | model returns text with no tool_calls |
That's your final answer |
Exercises
- Add a tool. Register a
convert_currency(amount, from_currency, to_currency)tool and confirm the agent calls it for a relevant question. - Unknown-tool handling. Ask for something no tool covers and verify the agent degrades gracefully instead of crashing.
- Inspect the loop. Log each
tool_callthe model requests and the result you return — trace one full request end to end. - Bad arguments. Make the model call a tool with a missing or invalid argument and have your executor return a structured error the model can recover from.
Solutions (approaches)
- Add the function plus its Pydantic/tool spec, then add a branch in
execute_function. Ask "Convert 100 USD to EUR" and check the dispatched name. - In
execute_function, return{"error": f"unknown tool: {name}"}for unrecognized names instead of raising; the model can then apologize gracefully. - Add
print()/logging inside the loop forparsed["name"],parsed["arguments"], and the returned result — one full round shows request → execute → result → final text. - Wrap the call in
try/exceptand return{"error": str(e)}as the tool content; the model reads the error and can retry or ask for the missing field.
What's Next?
You can now run a single round of tool calls. Next up: Tool Execution Handling Part 2 — parallel tool calls, timeouts, error categorization, and smart retries for production.