Skip to content

Best Practices

Follow these recommendations to optimize cost, performance, and reliability when using the Claudexia API.

Choose the right model

Match the model to the task complexity:

TaskRecommended model
Complex reasoning, researchclaude-opus-4.5
General coding, everyday tasksclaude-sonnet-4.5
Quick completions, classificationclaude-haiku-4.5

Using a smaller model for simple tasks significantly reduces cost and latency.

Use prompt caching

If you send the same system prompt across multiple requests, use prompt caching to reduce input token costs.

Include cache_control in your system message block:

json
{
  "system": [
    {
      "type": "text",
      "text": "You are a helpful coding assistant...",
      "cache_control": { "type": "ephemeral" }
    }
  ]
}

Set appropriate max_tokens

Set max_tokens to the minimum needed for your use case:

  • Short answers: 256–512
  • Code generation: 2048–4096
  • Long-form content: 4096–8192

Lower values reduce cost and can decrease latency.

Use streaming for real-time responses

Enable streaming to receive tokens as they are generated, improving perceived latency:

bash
curl https://api.claudexia.tech/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: sk_cdx_YOUR_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4.5",
    "max_tokens": 1024,
    "stream": true,
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Use separate keys per environment

Create dedicated keys for each environment to isolate usage and simplify debugging:

  • Production — main application key with strict rate limits
  • Development — relaxed limits for testing
  • CI/CD — dedicated key for automated tests

This makes it easy to track costs, rotate keys, and revoke access per environment.

Monitor usage and set alerts

Use the Analytics dashboard to track:

  • Token consumption per key and model
  • Cost trends over time
  • Rate limit hits
  • Unexpected usage spikes

Set up low-balance alerts in Settings to get notified before your balance runs out.