Pricing note: the worked examples below use legacy Anthropic list prices ($3 / $15 per 1M for Sonnet, etc.) for illustrative ratios. Claudexia's actual current rates are flat — Opus & GPT $0.50 / $0.50, Sonnet & Haiku $0.33 / $0.33 per 1M. Real bills are 5–30× lower than the dollar amounts shown.
"How much will Claude cost us?" is the first question every engineering lead asks before greenlighting a feature, and the last question every finance team asks after the first invoice arrives. The gap between those two moments is almost always a missing spreadsheet. This post is that spreadsheet — turned into worked examples you can copy, adapt, and defend in a budget review. We will price four common workloads end-to-end: a support chatbot, a coding agent, a RAG knowledge base, and a batch classification job. Then we will rank the cost levers by ROI so you know which knob to turn first.
Token math primer
Before any pricing makes sense, you need a stable mental model of a token. A useful rule of thumb is that 1 token ≈ 4 characters of English text, or roughly 0.75 words. A 500-word email is therefore around 670 tokens. A typical 8K-context system prompt with tool definitions and few-shot examples lands between 6,000 and 8,000 input tokens. A long Markdown report of 2,000 words runs about 2,700 output tokens. Code is denser: 1,000 lines of TypeScript is usually 4,000 to 6,000 tokens depending on identifier length and whitespace.
The reason this matters is that Claude is billed per million tokens, split between input and output, and output costs roughly 5× input at every tier. A workload that looks cheap on input alone can be five times more expensive once you account for the response.
2026 prices per 1M tokens (illustrative)
The numbers below are the rates we model against in 2026. They match
Anthropic's published list prices, and Claudexia bills them 1:1 with no
markup, just routed through https://api.claudexia.tech/v1.
| Model | Input ($/1M tok) | Output ($/1M tok) | Cached input ($/1M tok) |
|---|---|---|---|
| Haiku 4.5 | $0.33 | $0.33 | $0.05 |
| Sonnet 4.6 | $0.33 | $0.33 | $0.05 |
| Opus 4.7 | $0.50 | $0.50 | $0.05 |
Two things to internalise: cached input is 10× cheaper than fresh input, and the Batch API gives a 50% discount on both input and output for jobs you can wait up to 24 hours to complete. Those two mechanics drive most of the cost optimisations below.
Worked example 1: support chatbot
Workload: 10,000 conversations per month. Average 8,000 input tokens per turn (system prompt + tool defs + retrieved context + user history) and 800 output tokens per response. Single-turn for simplicity.
Run on Sonnet 4.6, without caching:
- Input: 10,000 × 8,000 = 80M tokens × $3.00 = $240
- Output: 10,000 × 800 = 8M tokens × $15.00 = $120
- Monthly total: $360
Now turn on prompt caching for the stable 6,000-token system prompt and tool definitions, leaving 2,000 tokens of dynamic context per call:
- Cached input: 10,000 × 6,000 = 60M tokens × $0.30 = $18
- Fresh input: 10,000 × 2,000 = 20M tokens × $3.00 = $60
- Output: unchanged at $120
- Monthly total: $198 — a 45% reduction.
Cost per conversation drops from $0.036 to $0.020. At 100,000 conversations per month, that delta is $1,800 saved every month from a single config flag.
Worked example 2: coding agent
Workload: 200 pull requests per day, each running an agent loop with roughly 30,000 input tokens (repo context, diffs, tool call history) and 5,000 output tokens (proposed patches, explanations, test code). Run on Sonnet 4.6.
Daily math:
- Input: 200 × 30,000 = 6M tokens × $3.00 = $18
- Output: 200 × 5,000 = 1M tokens × $15.00 = $15
- Daily total: $33 → roughly $990/month
Coding agents are usually the place where Opus earns its price tag. If 20% of PRs trigger an Opus escalation for hard reasoning, the blended cost rises:
- 80% Sonnet at $33/day = $26.40
- 20% Opus at 200 × 30K input × $15 + 200 × 5K output × $75 scaled by 20% = $18 input + $15 output = $33/day × 5 = $165/day blended → only the 20% slice: 200 × 0.2 × (30K × $15/1M + 5K × $75/1M) = 40 × ($0.45 + $0.375) = 40 × $0.825 = $33/day for the Opus slice
- Blended daily total: ~$59 → roughly $1,770/month
Caching the 20K-token "stable" portion of repo context (tool defs, conventions, package manifests) halves the input bill again. Most teams land around $1,100–$1,300/month after caching.
Worked example 3: RAG over a 10M-token knowledge base
Workload: 1,000 queries per day. Each query retrieves the top-K chunks totalling ~5,000 tokens, plus a 1,500-token system prompt, and generates a 600-token answer on Sonnet 4.6.
Daily math:
- Input: 1,000 × 6,500 = 6.5M tokens × $3.00 = $19.50
- Output: 1,000 × 600 = 0.6M tokens × $15.00 = $9.00
- Daily total: $28.50 → roughly $855/month
The interesting question is whether to put the whole 10M-token KB in context with caching instead of running a retriever. Cached context of 10M tokens costs $3,000 to write once, then $300/day to read at $0.30/1M. That math only wins above ~30,000 queries/day for this size of KB — below that, retrieval-then-generate stays cheaper. Above it, full-context with caching becomes a real architectural option, especially if recall quality of the retriever is poor.
Worked example 4: batch classification
Workload: 1,000,000 records to classify. Each record has 500 input tokens and produces a 50-token JSON label. Run on Haiku via the Batch API.
Without batching:
- Input: 1M × 500 = 500M tokens × $0.25 = $125
- Output: 1M × 50 = 50M tokens × $1.25 = $62.50
- Total: $187.50
With Batch API (50% discount):
- Input: 500M × $0.125 = $62.50
- Output: 50M × $0.625 = $31.25
- Total: $93.75 — exactly half.
For periodic backfills, content moderation sweeps, or overnight tagging jobs, the Batch API is the single biggest no-code lever you have.
Cost levers ranked by ROI
After running these four scenarios with dozens of teams on Claudexia, the lever order is consistent:
- Cap output tokens. Output costs 5× input. A
max_tokensceiling and a "be concise" instruction is the highest-ROI change you can ship, often 30–50% savings with no quality loss. - Cache the stable system prompt. Tool defs, role instructions, and few-shot examples rarely change. Caching them cuts input by 90% on the cached portion.
- Move async work to the Batch API. A flat 50% off input and output for any job that can tolerate a 24-hour SLA.
- Route to Haiku first. Use Haiku for classification, routing, extraction, and "is this even worth escalating?" gates. Reserve Sonnet for reasoning and Opus for the hard 5%.
- Compress retrieved context. Cut your top-K, summarise long chunks, and drop redundant chat history. Every 1,000 input tokens saved per call compounds across millions of calls.
Cost per call and cost per active user
Two tables to keep on your roadmap doc.
Cost per call (Sonnet 4.6, with caching, 600 output tokens):
| Input tokens | Cost per call |
|---|---|
| 2,000 | $0.012 |
| 5,000 | $0.018 |
| 10,000 | $0.027 |
| 20,000 | $0.045 |
Cost per monthly active user (chatbot, Sonnet 4.6, with caching):
| Calls/user/month | Cost per MAU |
|---|---|
| 5 | $0.10 |
| 20 | $0.40 |
| 100 | $2.00 |
| 500 | $10.00 |
These two tables let you draw a straight line from product metrics (MAU, calls per user) to a defensible cost line in your forecast.
How Claudexia fits
Claudexia bills Anthropic prices 1:1 — the rates above are the rates you
pay. The dashboard tracks actual usage by model, by API key, and by
conversation, so the moment a feature ships you can see whether the real
numbers match the spreadsheet you built from this post. When they don't,
the levers above are usually the answer. Point your SDK at
https://api.claudexia.tech/v1 and the same code you wrote against
Anthropic keeps working — including caching headers and the Batch API.