Tool use is the bridge between Claude and the rest of your stack. Get the schemas right and you get a reliable agent that books flights, queries databases, and drives a browser. Get them wrong and you get a model that confidently calls get_weather with location: "the place the user mentioned". In 2026, with Sonnet 4.5 and Opus 4.5 supporting parallel tool calls, computer use, and increasingly strict JSON adherence, the gap between "demo" and "production" is almost entirely a function of how well you write your tool definitions.
This post is the practical guide we wish we had when we started building agents on Claude.
Anatomy of a Claude tool definition
Every tool you give Claude has exactly three fields that matter:
{
"name": "get_weather",
"description": "Get the current weather and 24h forecast for a specific city. Use this when the user asks about temperature, rain, snow, or weather conditions. Do not use for historical weather (before today) — use get_historical_weather instead.",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name in English, e.g. 'San Francisco', 'Tokyo'. Do not include country."
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units. Default to celsius unless the user is in the US."
}
},
"required": ["city"]
}
}
Three things, in priority order:
description— the single most important variable for selection accuracy. This is what Claude reads to decide whether to call your tool at all, and whether it has the right tool for the user's request. Think of it as the docstring you would write for a junior engineer who has never seen this codebase.input_schema— JSON Schema describing arguments. Useenumaggressively. Usedescriptionon every property. Markrequiredhonestly.name— keep it lowercase, snake_case, action-verb-led (get_,create_,search_).
The mistake almost everyone makes on day one is treating description as a label. It is not. It is a mini-prompt. Tell Claude when to use the tool, when not to use it, what the side effects are, and what the input means.
Parallel tool calls in Sonnet 4.5+
Older Claude releases produced one tool_use block per assistant turn. Sonnet 4.5 and Opus 4.5 can return multiple tool_use blocks in a single response when the calls are independent. This matters because round-trip latency dominates agent UX.
If a user asks "what's the weather in Tokyo and Paris and what's on my calendar tomorrow", a 2025-era integration would do three sequential turns. A 2026 integration sees three tool_use blocks in one response, executes them in parallel on your side, and returns three tool_result blocks in the next user message. One round trip instead of three.
You don't have to do anything special to opt in — just be ready to receive an array of tool calls and dispatch them concurrently.
A working example: weather + database
Here's a minimal but realistic agent loop using the Claudexia gateway, which speaks the Anthropic Messages API natively:
import anthropic
import json
client = anthropic.Anthropic(
base_url="https://api.claudexia.tech/v1",
api_key="cxa-..."
)
tools = [
{
"name": "get_weather",
"description": "Current weather for a city. Use for 'what's the weather' questions.",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name in English"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["city"]
}
},
{
"name": "query_orders",
"description": "Search the orders database by customer email or order ID. Returns up to 20 most recent orders.",
"input_schema": {
"type": "object",
"properties": {
"email": {"type": "string", "format": "email"},
"order_id": {"type": "string", "description": "Order ID like 'ORD-12345'"}
}
}
}
]
def execute_tool(name, args):
if name == "get_weather":
return {"city": args["city"], "temp_c": 14, "condition": "cloudy"}
if name == "query_orders":
return {"orders": [{"id": "ORD-12345", "total": 89.50, "status": "shipped"}]}
return {"error": f"unknown tool: {name}"}
messages = [{"role": "user", "content": "Weather in Tokyo, and find orders for jane@example.com"}]
while True:
response = client.messages.create(
model="claude-sonnet-4.5",
max_tokens=2048,
tools=tools,
messages=messages
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
break
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result)
})
messages.append({"role": "user", "content": tool_results})
print(response.content[-1].text)
Notice the loop: while the response says stop_reason == "tool_use", execute every tool block, append all results in a single user message, and call again. This is the entire orchestration pattern.
Error recovery: tell Claude what went wrong
The single biggest reliability win is treating tool errors as data, not exceptions. When your tool fails — bad input, API down, rate limited — return an error inside the tool_result and let Claude react:
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"is_error": True,
"content": "Database timeout after 5s. Try again with a narrower filter or fewer fields."
})
Claude will read that, often retry with adjusted arguments, or escalate to the user with a clear explanation. This pattern alone replaces a huge amount of brittle try/except wrapping you'd otherwise write around the model.
tool_choice: auto, any, tool, none
The tool_choice parameter controls how aggressive Claude is about calling tools:
{"type": "auto"}— default. Claude picks whether to call a tool or just respond.{"type": "any"}— Claude must call some tool, but picks which one.{"type": "tool", "name": "get_weather"}— force a specific tool. Useful for structured extraction: define arecord_user_intenttool with the exact schema you want, force the call, and you get guaranteed JSON back.{"type": "none"}— disable tool calls for this turn even though tools are defined.
Forced tool_choice is the cleanest way to get reliable structured output from Claude in 2026. Skip the JSON-mode dance — define a tool, force it, parse block.input.
Computer use: when text tools aren't enough
Sonnet 4.5+ supports computer use — the model can take screenshots, move the mouse, type, and click in a virtual desktop. You wire this up by adding a special computer_20250124 tool plus bash and text_editor tools. Claude returns actions like {"action": "screenshot"} or {"action": "click", "coordinate": [512, 384]}, your harness executes them, and you send back the new screenshot.
Use computer use when:
- The system you need to drive has no API (legacy admin panels, vendor portals).
- The task is genuinely visual (verifying a layout, reading a chart in a PDF).
- You need to QA your own product end-to-end.
Don't use it when a real API exists. Computer use is slower, less reliable, and more expensive per task than function calling against a proper backend.
Anthropic native vs OpenAI-compat shape
Claudexia exposes both endpoints. The shapes differ in obvious ways: Anthropic uses tool_use/tool_result content blocks; OpenAI uses tool_calls with function.name and function.arguments as a JSON string.
If you're building greenfield, use the Anthropic shape — it's structured, the parallel-call ergonomics are better, and you avoid stringly-typed arguments. If you're porting an existing OpenAI-shaped codebase, the compat endpoint at https://api.claudexia.tech/v1/chat/completions lets you swap models without rewriting your tool layer; you can migrate to native later.
Debugging tool calls
When an agent goes off the rails, three things give you 90% of the signal:
- Log every
tool_use_idalongside the input and result. Mismatched IDs are the most common cause of silent breakage. - Log
stop_reasonfor every turn. If you seeend_turnwhen you expectedtool_use, your description is unclear ortool_choiceis wrong. - Replay the full message array through the API after a failure. Claude is deterministic enough at low temperature that you can usually reproduce the bad call.
A small dashboard that shows the message timeline, tool calls, and results side-by-side pays for itself within a week.
Bottom line
Reliable Claude agents in 2026 come from two habits, repeated:
- Rich descriptions on every tool and every parameter. Treat the description field as a prompt, not a label. Tell Claude when not to use the tool, what the inputs mean, and what comes back.
- Small, focused tools. Five tools that each do one thing beat one mega-tool with twelve optional parameters every time. Claude picks better, the schemas are simpler, and errors are easier to reason about.
Parallel calls, computer use, and forced tool_choice are powerful — but they amplify whatever's already in your tool design. Spend the hour writing the description. It's the highest-leverage hour in the entire agent stack.
For pricing context on tool-heavy workloads, see our Claude API pricing breakdown.