Best Practices
Follow these recommendations to optimize cost, performance, and reliability when using the Claudexia API.
Choose the right model
Match the model to the task complexity:
| Task | Recommended model |
|---|---|
| Complex reasoning, research | claude-opus-4.5 |
| General coding, everyday tasks | claude-sonnet-4.5 |
| Quick completions, classification | claude-haiku-4.5 |
Using a smaller model for simple tasks significantly reduces cost and latency.
Use prompt caching
If you send the same system prompt across multiple requests, use prompt caching to reduce input token costs.
Include cache_control in your system message block:
{
"system": [
{
"type": "text",
"text": "You are a helpful coding assistant...",
"cache_control": { "type": "ephemeral" }
}
]
}Set appropriate max_tokens
Set max_tokens to the minimum needed for your use case:
- Short answers: 256–512
- Code generation: 2048–4096
- Long-form content: 4096–8192
Lower values reduce cost and can decrease latency.
Use streaming for real-time responses
Enable streaming to receive tokens as they are generated, improving perceived latency:
curl https://api.claudexia.tech/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: sk_cdx_YOUR_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4.5",
"max_tokens": 1024,
"stream": true,
"messages": [
{"role": "user", "content": "Hello!"}
]
}'Use separate keys per environment
Create dedicated keys for each environment to isolate usage and simplify debugging:
- Production — main application key with strict rate limits
- Development — relaxed limits for testing
- CI/CD — dedicated key for automated tests
This makes it easy to track costs, rotate keys, and revoke access per environment.
Monitor usage and set alerts
Use the Analytics dashboard to track:
- Token consumption per key and model
- Cost trends over time
- Rate limit hits
- Unexpected usage spikes
Set up low-balance alerts in Settings to get notified before your balance runs out.