Every team that scales Claude usage past a handful of engineers eventually hits the same fork in the road: do we put a gateway in front of the Anthropic API, and if so, do we run it ourselves or pay someone else to? The default open-source answer in 2026 is LiteLLM — a popular Python proxy that fronts 100+ model providers behind an OpenAI-compatible API. The default managed answer is Claudexia. This post is a sober comparison of the two, with real numbers for total cost of ownership, the operational work you are signing up for, and the security and compliance edges where each option wins.
What LiteLLM actually is
LiteLLM started as a thin Python SDK that normalised the calling
conventions of OpenAI, Anthropic, Cohere, and friends behind a single
litellm.completion() function. It grew into a full proxy server with a
PostgreSQL-backed control plane: virtual API keys, per-key budgets, rate
limits, team and user hierarchies, audit logs, an admin UI, and webhook
callbacks. You deploy it as a container, point it at one or more upstream
provider keys, and your application code talks to your proxy using the
OpenAI SDK with a custom base_url.
It is a genuinely good piece of software. It is also a piece of software that you now own — patches, incidents, capacity planning, and all.
The real TCO of self-hosting LiteLLM
The sticker price of LiteLLM is zero. The actual cost of running it as a production-grade gateway for a team of, say, 20 engineers and a handful of internal apps looks more like this on a monthly basis:
- Compute. Two small instances behind a load balancer, plus a staging copy. Call it $80–$150/month on any major cloud once you add the LB itself.
- PostgreSQL. A managed Postgres for the control plane (keys, spend, audit logs). $50–$120/month for a small HA tier with backups.
- Redis. Required for accurate distributed rate limiting across more than one proxy replica. $20–$50/month.
- Observability. Logs, metrics, and traces — Datadog, Grafana Cloud, or your existing stack. Realistically $50–$200/month of incremental spend once you actually instrument it.
- Egress and TLS. Certificates are free; egress to Anthropic is not, and at scale it shows up.
That is roughly $200–$500/month in pure infrastructure before a human has touched it. The infra is the cheap part.
The expensive part is people. A production gateway needs:
- An on-call rotation that owns the proxy when Anthropic has a regional incident, when your Postgres fails over at 03:00, or when a runaway agent burns a virtual key's budget in ten minutes.
- Security patching of the LiteLLM container, its base image, and its Python dependencies — the proxy sits on the path of every prompt and every response, so CVEs there are not optional.
- Periodic key rotation of upstream Anthropic keys, virtual keys for internal teams, and the proxy's own admin credentials.
- Upgrade work. LiteLLM ships frequently and occasionally introduces schema migrations or config breaks; somebody has to read changelogs and test upgrades in staging.
If you cost a senior engineer at $150/hour fully-loaded and assume four hours a week of steady-state ownership plus the occasional incident, that is $2 000–$3 000/month of human time on top of the infra. The honest TCO of "free" LiteLLM at modest scale is closer to $2 500–$3 500/month than to zero.
The Claudexia managed alternative
Claudexia is the same shape of product — virtual keys, budgets, rate
limits, audit logs, OpenAI-compatible and Anthropic-native endpoints —
delivered as a managed service. There is no proxy to deploy, no Postgres
to back up, no Redis to size, and no on-call to staff. You sign up, mint
a key, point your SDK at https://api.claudexia.tech/v1, and you are
done. Pricing is pay-per-token at Anthropic's direct rates, with no
monthly minimum and no seat fees.
The buy-vs-build math at modest scale is therefore not really about infrastructure. It is about whether the marginal margin Claudexia takes on tokens is larger or smaller than the $2 500–$3 500/month you would otherwise spend running LiteLLM yourself. For most teams under a few hundred million tokens a month, managed wins comfortably.
Feature parity at a glance
| Capability | Self-hosted LiteLLM | Claudexia (managed) |
|---|---|---|
OpenAI-compatible /v1/chat/completions | Yes | Yes |
Anthropic-native /v1/messages | Yes | Yes |
| Virtual API keys per team/user | Yes | Yes |
| Per-key budgets and spend caps | Yes | Yes |
| Rate limits (RPM / TPM) | Yes | Yes |
| Audit logs of every call | Yes (in your DB) | Yes (in dashboard) |
| Prompt caching pass-through | Yes | Yes |
| Streaming | Yes | Yes |
| Bring-your-own provider key | Yes (any provider) | No (Claude only) |
| Custom Python routing logic | Yes | No |
| You operate the database | Yes | No |
| You patch the container | Yes | No |
| You staff the on-call | Yes | No |
The functional surface is close to identical for Claude workloads. The divergence is in who runs it and how many providers it fronts.
Security: who actually sees the prompts
This is the question that should drive the decision more than TCO.
With self-hosted LiteLLM, prompts and completions traverse:
- Your application.
- Your LiteLLM proxy (in your VPC).
- Anthropic's API.
Three parties, two of which you control. Logs, if any, live in your infrastructure and inherit your existing data retention and access controls.
With Claudexia, prompts and completions traverse:
- Your application.
- Claudexia's gateway.
- Anthropic's API.
Three parties, one of which is us. We do not retain prompt or completion content beyond what is needed to bill and to debug a specific request, and we do not use it for training. But you are trusting our word and our SOC 2 controls instead of your own. For most teams that is fine; for some it is not.
Compliance: when self-host is genuinely required
There are scenarios where self-hosting is not a preference, it is a requirement:
- HIPAA workloads where you cannot sign a BAA with the gateway vendor. If your only BAA-covered Claude path is direct to Anthropic via AWS Bedrock, the gateway has to live inside your HIPAA boundary.
- SOC 2 / ISO 27001 boundary rigidity. If your audit scope says no third-party processors touch customer data, a managed gateway is a new subprocessor you would have to disclose, vendor-assess, and defend.
- Data residency mandates. EU-only or RU-only routing where you cannot let traffic leave a specific region.
- Air-gapped or VPC-only environments. Self-evident.
If any of those apply, run LiteLLM. The TCO conversation is moot.
Code: what setup actually looks like
Self-hosted LiteLLM, minimum viable production:
# 1. Postgres + Redis already provisioned, then:
docker run -d --name litellm \
-e DATABASE_URL=postgres://... \
-e REDIS_URL=redis://... \
-e ANTHROPIC_API_KEY=sk-ant-... \
-e LITELLM_MASTER_KEY=sk-master-... \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-stable \
--config /app/config.yaml
Plus a config.yaml declaring your model list, plus Terraform for the
load balancer, plus alerting, plus backups, plus an upgrade runbook.
Then your application:
from openai import OpenAI
client = OpenAI(
api_key="sk-virtual-key-for-this-team",
base_url="https://litellm.internal.example.com/v1",
)
resp = client.chat.completions.create(
model="claude-sonnet-4.6",
messages=[{"role": "user", "content": "hello"}],
)
Claudexia, same workload:
from openai import OpenAI
client = OpenAI(
api_key="cdx-...",
base_url="https://api.claudexia.tech/v1",
)
resp = client.chat.completions.create(
model="claude-sonnet-4.6",
messages=[{"role": "user", "content": "hello"}],
)
The application code is the same shape on purpose — that is the whole point of the OpenAI-compatible standard. The difference is that the second snippet is the entire setup.
When LiteLLM still wins
We are not going to pretend Claudexia is the right answer for every team. LiteLLM is the better choice when:
- You need to bring your own keys for many providers — OpenAI, Anthropic, Bedrock, Vertex, Azure OpenAI, Cohere, Mistral, Together, Groq, your own vLLM deployment — behind a single endpoint. Claudexia is Claude-focused; LiteLLM is the multi-provider Swiss army knife.
- You want custom Python routing logic in the proxy itself: pre-call hooks that rewrite prompts, post-call hooks that redact PII, fallback chains across providers, or guardrails plugged into the request path.
- You have a compliance boundary that disallows a managed subprocessor, as discussed above.
- You already operate a mature platform team for which one more service is rounding error.
Bottom line
If you are a Claude-first team under a few hundred million tokens a month, the buy decision is straightforward: managed gateways like Claudexia are cheaper than the fully-loaded cost of running LiteLLM yourself, and they free your engineers to ship product instead of patching a proxy. If you are multi-provider, need custom routing in the request path, or sit inside a compliance boundary that forbids a third-party gateway, self-host LiteLLM and budget honestly for what that actually costs.
Either way, do not skip the gateway entirely. Calling Anthropic directly from twenty different services with a shared root key is the worst of both worlds.