Best Practices

Follow these recommendations to optimize cost, performance, and reliability when using the Claudexia API.

Choose the right model

Match the model to the task complexity:

Task	Recommended model
Complex reasoning, research	`claude-opus-4.5`
General coding, everyday tasks	`claude-sonnet-4.5`
Quick completions, classification	`claude-haiku-4.5`

Using a smaller model for simple tasks significantly reduces cost and latency.

Use prompt caching

If you send the same system prompt across multiple requests, use prompt caching to reduce input token costs.

Include cache_control in your system message block:

json

{
  "system": [
    {
      "type": "text",
      "text": "You are a helpful coding assistant...",
      "cache_control": { "type": "ephemeral" }
    }
  ]
}

Set appropriate max_tokens

Set max_tokens to the minimum needed for your use case:

Short answers: 256–512
Code generation: 2048–4096
Long-form content: 4096–8192

Lower values reduce cost and can decrease latency.

Use streaming for real-time responses

Enable streaming to receive tokens as they are generated, improving perceived latency:

bash

curl https://api.claudexia.tech/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: sk_cdx_YOUR_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4.5",
    "max_tokens": 1024,
    "stream": true,
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Use separate keys per environment

Create dedicated keys for each environment to isolate usage and simplify debugging:

Production — main application key with strict rate limits
Development — relaxed limits for testing
CI/CD — dedicated key for automated tests

This makes it easy to track costs, rotate keys, and revoke access per environment.

Monitor usage and set alerts

Use the Analytics dashboard to track:

Token consumption per key and model
Cost trends over time
Rate limit hits
Unexpected usage spikes

Set up low-balance alerts in Settings to get notified before your balance runs out.

API Key Configuration