The problem: LLMs hallucinate JSON
Anyone who has shipped an LLM-powered feature knows the pain. You ask the model for "a JSON object with fields name, email, score" and 99% of the time you get exactly that. The other 1% you get:
- A leading
```jsoncode fence the parser chokes on. - A trailing comment like
// note: score estimated. - A trailing comma that breaks
JSON.parse. - A nested string with an unescaped quote.
- A field renamed to
e_mailbecause the model thought it looked nicer. - Half the response in markdown explaining what the JSON means.
At a million calls a month, that 1% is ten thousand silent production failures. Free-form text generation is fundamentally a poor fit for machine-to-machine contracts.
OpenAI shipped response_format: { type: "json_schema", strict: true } in 2024 and effectively solved this on their stack — the model's decoding is constrained at the token level so the output is grammar-guaranteed valid against the schema. Anthropic took a different route. As of 2026 there is still no json_mode flag on the native Messages API. Instead, Anthropic asks you to lean on the feature they already had: tool use.
This article is the production pattern we recommend for getting 100% valid, schema-conformant JSON out of Claude — covering the native API via tool_use forcing, the OpenAI-compatible response_format shim, Pydantic and Zod schema generation, validation + retry loops, and a comparison with GPT‑4o's native json_schema mode.
All examples point at the Claudexia gateway base URL https://api.claudexia.tech/v1, which exposes both the native Anthropic Messages API and an OpenAI-compatible Chat Completions endpoint. Pricing and model coverage are in our Claude API pricing 2026 post.
Anthropic's answer: forced tool_use
Claude's tool use feature was designed for agentic workflows — letting the model call get_weather or search_db. But the mechanism it uses to emit tool calls is exactly what we want for structured output: the model produces a tool_use block whose input field is always a valid JSON object matching the tool's input_schema.
Anthropic's decoder enforces this. The model is not "asked nicely" to produce JSON; the tool input is the JSON. Combine that with tool_choice: { type: "tool", name: "..." } to force the model to call exactly that one tool, and you have a structured-output API in everything but name.
The recipe:
- Define your output shape as a Pydantic model (Python) or a Zod schema (TypeScript).
- Convert it to a JSON Schema.
- Wrap it as a single tool with
input_schema = <your JSON Schema>. - Send the request with
tools=[that_tool]andtool_choice={"type":"tool","name":"<your tool name>"}. - Read
response.content[0].input— that's your validated object.
Python: Pydantic → JSON Schema → tool_use
import anthropic
from pydantic import BaseModel, Field
from typing import Literal
class InvoiceLineItem(BaseModel):
description: str
quantity: int = Field(ge=1)
unit_price_cents: int = Field(ge=0)
class Invoice(BaseModel):
invoice_number: str
issued_on: str = Field(description="ISO-8601 date, e.g. 2026-04-03")
currency: Literal["USD", "EUR", "RUB", "GBP"]
vendor_name: str
line_items: list[InvoiceLineItem]
total_cents: int
client = anthropic.Anthropic(
base_url="https://api.claudexia.tech/v1",
api_key="sk-cxa-...",
)
extract_tool = {
"name": "record_invoice",
"description": "Record a parsed invoice into the accounting system.",
"input_schema": Invoice.model_json_schema(),
}
resp = client.messages.create(
model="claude-sonnet-4.5",
max_tokens=2048,
tools=[extract_tool],
tool_choice={"type": "tool", "name": "record_invoice"},
messages=[
{"role": "user", "content": f"Extract the invoice:\n\n{raw_invoice_text}"}
],
)
raw = resp.content[0].input # already a dict, already valid against the schema
invoice = Invoice.model_validate(raw) # belt-and-suspenders Pydantic check
The resp.content[0].input value comes back as a Python dict that already parses cleanly and already satisfies the JSON Schema you sent. The extra Invoice.model_validate(raw) call is defensive — it gives you Pydantic's type coercion (e.g., string "42" → int 42) and triggers your custom validators.
TypeScript: Zod → JSON Schema → tool_use
import Anthropic from "@anthropic-ai/sdk";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
const SupportTicket = z.object({
category: z.enum(["billing", "bug", "feature_request", "abuse", "other"]),
urgency: z.enum(["low", "normal", "high", "critical"]),
summary: z.string().max(280),
suggested_owner_team: z.enum(["payments", "platform", "growth", "trust_safety"]),
contains_pii: z.boolean(),
});
const client = new Anthropic({
baseURL: "https://api.claudexia.tech/v1",
apiKey: process.env.CLAUDEXIA_KEY!,
});
const resp = await client.messages.create({
model: "claude-sonnet-4.5",
max_tokens: 1024,
tools: [{
name: "classify_ticket",
description: "Classify an inbound customer support ticket.",
input_schema: zodToJsonSchema(SupportTicket, { target: "openAi" }) as any,
}],
tool_choice: { type: "tool", name: "classify_ticket" },
messages: [{ role: "user", content: ticketBody }],
});
const block = resp.content.find((b) => b.type === "tool_use");
const ticket = SupportTicket.parse(block?.input);
Note target: "openAi" on zodToJsonSchema: it produces a JSON Schema variant Claude (and OpenAI) accept without complaint. The default Zod target emits $ref patterns the Anthropic API will reject.
OpenAI-compatible: response_format on the Claudexia gateway
If you have an existing codebase built against the OpenAI SDK, Claudexia's gateway accepts the OpenAI Chat Completions shape and translates response_format to forced tool_use under the hood. You don't have to rewrite anything.
from openai import OpenAI
client = OpenAI(
base_url="https://api.claudexia.tech/v1",
api_key="sk-cxa-...",
)
resp = client.chat.completions.create(
model="claude-sonnet-4.5",
messages=[{"role": "user", "content": resume_text}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "Resume",
"strict": True,
"schema": Resume.model_json_schema(),
},
},
)
resume = Resume.model_validate_json(resp.choices[0].message.content)
This is the path of least resistance if you're migrating off OpenAI. Same SDK, same response_format semantics, Claude-quality output.
Three concrete examples
1. Extract an invoice
Input: a noisy PDF-extracted text of a vendor invoice. Output: the Invoice model from the Python snippet above. Forced tool_use means you never have to write a regex to scrape line items again — the model returns a typed list and your downstream code consumes invoice.line_items[0].unit_price_cents directly.
2. Classify a support ticket
Use the SupportTicket Zod schema. The category and urgency enums constrain the model so you can route on the result without a defensive .toLowerCase() everywhere. contains_pii is a boolean you can plumb straight into a redaction pipeline.
3. Parse a resume
class WorkExperience(BaseModel):
company: str
title: str
started_on: str = Field(description="YYYY-MM")
ended_on: str | None = Field(description="YYYY-MM or null if current")
highlights: list[str]
class Resume(BaseModel):
full_name: str
email: str | None
phone: str | None
years_of_experience: int = Field(ge=0, le=60)
skills: list[str]
experience: list[WorkExperience]
Drop in the same tool_use plumbing and you have a structured-resume API.
Schema design tips
After shipping a few dozen of these in production, a handful of patterns matter more than the rest.
- Use
enumaggressively. Every field whose value belongs to a finite set should be an enum. The model is dramatically more accurate at picking from a list than at free-form generation. - Descriptions are not decoration. Claude reads
descriptionstrings on every field. "ISO-8601 date" and "phone number in E.164 format" change the output. Treat them like prompts. - Keep nested depth shallow. Two levels deep is the sweet spot. At three or four levels you start seeing the model lose track of which
{it's inside, even with forced tool use. - Prefer
stringenums over booleans for tri-state.status: "approved" | "rejected" | "needs_review"reasons better than two separate booleans. - Mark optional fields explicitly.
Optional[str]in Pydantic /.optional()in Zod. The model will leave them out cleanly instead of inventing"unknown". - Don't ask for free-text inside structured fields. A
summary: strfield at the end is fine. Asummaryfield followed by more structured fields tends to bleed prose into the structured ones.
Validation and retry loop
Even with forced tool_use, your business validators (e.g., "total_cents must equal sum of line items") can still fail. The pattern is straightforward:
def call_with_retry(messages, max_attempts=3):
for attempt in range(max_attempts):
resp = client.messages.create(
model="claude-sonnet-4.5",
max_tokens=2048,
tools=[extract_tool],
tool_choice={"type": "tool", "name": "record_invoice"},
messages=messages,
)
raw = resp.content[0].input
try:
return Invoice.model_validate(raw)
except ValidationError as e:
messages.append({"role": "assistant", "content": resp.content})
messages.append({
"role": "user",
"content": f"That output failed validation: {e}. Please call the tool again with corrected values.",
})
raise RuntimeError("Schema validation failed after retries")
A single retry usually fixes anything, and the second attempt sees the original model output plus the validation error in context — Claude will correct itself reliably.
GPT‑4o json_schema vs Claude forced tool_use
Both approaches deliver schema-conformant JSON in production. Practical differences:
- Guarantee surface. OpenAI's
strict: trueis enforced by constrained decoding at the token level — the model literally cannot emit invalid JSON. Anthropic's forcedtool_useis enforced by a validator on the tool input; in our experience the difference is invisible at the API surface (both return valid JSON every time), but the failure mode if Claude does fail is "the call errors out" rather than "the JSON is malformed". - Schema feature support. OpenAI's strict mode disallows
oneOf,not, recursive refs, and a few other JSON Schema features. Claude'stool_useaccepts a broader subset, including some$refpatterns (when emitted with the right Zod target) and richerpatternstrings. - Description usage. Claude weighs field descriptions more heavily than GPT‑4o does. If your schema has rich
descriptionfields, expect Claude to take more advantage of them. - Latency. Forced
tool_useadds a small fixed overhead vs free-form generation but is on par with GPT‑4o'sjson_schemamode in our benchmarks. - Migration cost. Through the Claudexia gateway's OpenAI-compatible endpoint, you change one line — the
modelparameter — andresponse_formatkeeps working.
Wrap-up
Claude does not need a json_mode flag, because tool use already gives you something stronger: a typed contract enforced by the API. Define your schema once in Pydantic or Zod, force a single tool, and you have a structured-output endpoint that is as reliable as anything in the industry. The OpenAI-compatible shim on the Claudexia gateway means you can drop Claude into an existing response_format codebase without changing a line of business logic. Combine that with a retry loop on your custom validators and you have a JSON pipeline that simply does not break.