Skip to content
TROUBLESHOOTING

Claude API Errors: Fix 529 Overloaded, 500, 429 & 400

Fix every Claude API error — 529 Overloaded, 500, 429, 400, 401, 413, timeouts — with cause-to-fix steps and retry code in Python and TypeScript.

Your integration worked fine in testing. Then traffic arrives, and the logs fill with 529 Overloaded and the occasional 500. Some of these errors are yours to fix. Some are Anthropic's, and your only job is to retry intelligently. The trick is knowing which is which — because retrying a 400 forever just burns money and wakes you up at 3 AM for nothing. This guide sorts every Claude API error into the right bucket and gives you working code.

What does the 529 Overloaded error mean?

A 529 means Anthropic's servers are temporarily saturated and can't take your request right now. It's not about your key, your quota, or your prompt — it's about aggregate demand on the platform at that exact moment. Per Anthropic's docs, 529 is the "Overloaded" status, and the correct response is to back off and try again shortly.

This is the single most common error people hit on the raw Anthropic API, and it spikes during peak hours and right after a new model ships. Because it's transient and server-side, retrying is the right move — but retrying immediately and aggressively makes the congestion worse for everyone, including you.

Here's the shape of the response:

{
  "type": "error",
  "error": {
    "type": "overloaded_error",
    "message": "Overloaded"
  }
}

Cause → fix for 529:

  • Cause: platform-wide capacity pressure, often at peak times or post-launch.
  • Fix: exponential backoff with jitter (code below), cap your concurrency, and shift non-urgent work to off-peak windows or the Batch API.
  • Fix at the architecture level: route through a gateway that fails over across providers and holds separate capacity, so a single overloaded backend doesn't become your outage.

One honest note: there's no header that tells you when the overload will clear. Unlike 429, a 529 rarely carries a useful retry-after. So your backoff schedule is your only lever. Don't skip the jitter — without it, every client in the world retries on the same beat and re-creates the spike.

How do I fix a Claude API 500 error?

A 500 Internal Server Error means something broke on Anthropic's side while processing an otherwise valid request. You didn't do anything wrong, and the same request will usually succeed on retry. Treat it like 529: back off and try again.

The difference is intent. A 500 is an unexpected fault; a 529 is expected load-shedding. From your code's perspective they're handled identically — both are retryable server-side errors. What you should not do is parse the error body for clues and change your request. There's nothing to change.

{
  "type": "error",
  "error": {
    "type": "api_error",
    "message": "Internal server error"
  }
}

If 500s persist across many retries over several minutes, that's no longer transient — check the Anthropic status page and your own request payload for anything malformed enough to trip a server-side edge case (extremely large tool schemas, for instance). But 99% of the time, a single retry clears it.

What about 503 Service Unavailable?

A 503 means the service is temporarily down or in maintenance — another transient, retryable condition. Same playbook as 500 and 529: exponential backoff, respect any retry-after if present, give up gracefully after a sane number of attempts.

Lump these three together in your head: 529, 500, 503 = retry. They differ in cause (overload, internal fault, unavailability) but not in your response. Your retry layer should catch all three with the same logic.

Why am I getting a 429 rate limit error?

A 429 Too Many Requests means you've exceeded your requests-per-minute, tokens-per-minute, or daily token budget. Unlike the 5xx family, this one is about your usage, and the API tells you how long to wait via the retry-after header.

{
  "type": "error",
  "error": {
    "type": "rate_limit_error",
    "message": "Number of requests has exceeded your per-minute limit."
  }
}

The fix is still backoff, but with a twist: read retry-after and honor it instead of guessing. Beyond that, rate limits deserve their own playbook — tiers, TPM vs RPM, queuing, model routing — which we cover in depth in the Claude API rate limits guide. The short version: route simple calls to Haiku, cache repeated prompts, and consider pooled limits if you're hitting the ceiling constantly.

What causes a 400 Bad Request error?

A 400 means the request itself is malformed — the API rejected it before doing any work, and retrying the identical request will fail every single time. This is a code bug, not a transient hiccup. Stop, read the message, fix the payload.

The most common causes, in rough order of frequency:

CauseWhat it looks likeFix
Missing max_tokensmax_tokens: Field requiredAlways include max_tokens — it's mandatory on the Messages API
Bad JSONinvalid_request_error on a request that "looks fine"Validate JSON; watch trailing commas and unescaped quotes
Wrong role ordermessages: roles must alternateMessages must alternate user / assistant, starting with user
Empty messages arraymessages: at least one message requiredSend at least one message
max_tokens too highmax_tokens exceeds model maximumKeep it under the model's output ceiling
System prompt in messagesrole system rejectedPass the system prompt via the top-level system field, not a message

Here's a 400 from forgetting max_tokens:

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "max_tokens: Field required"
  }
}

And the correct request in curl, with everything in place:

curl https://api.claudexia.tech/v1/messages \
  -H "x-api-key: $CLAUDEXIA_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4.5",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Hello, Claude."}
    ]
  }'

When you get a 400, log the full error.message — it's specific and usually points straight at the broken field. Don't bury it.

Why does my API key return 401 or 403?

A 401 Unauthorized means your key is missing, malformed, or invalid. A 403 Forbidden means the key is valid but isn't allowed to do what you asked. Neither is retryable — fix the credentials or permissions and move on.

Walk this checklist for 401:

  • Is the key actually being sent? Check the x-api-key header (or Authorization: Bearer on the OpenAI-compatible endpoint).
  • Did an environment variable fail to load, sending an empty string?
  • Are you mixing keys and base URLs? An Anthropic sk-ant-… key won't authenticate against a different base URL, and a Claudexia sk_cdx_… key targets https://api.claudexia.tech.
  • Trailing whitespace or newline pasted into the key? It happens more than you'd think.

For 403, the key works but lacks access — for example, requesting a model your account or plan doesn't have, or hitting an endpoint outside your permissions. Read the message; it names the restriction.

{
  "type": "error",
  "error": {
    "type": "authentication_error",
    "message": "invalid x-api-key"
  }
}

What is the 413 Request Too Large error?

A 413 means your request payload is bigger than the API will accept — usually because you've stuffed too much into the context window or attached oversized images or documents. It's not retryable as-is; you have to shrink the request.

Common triggers and fixes:

  • Context overflow: you're pushing past the model's input limit. Trim history, summarize old turns, or chunk the work. Our long-context strategy post covers tactics for keeping payloads lean.
  • Huge images or PDFs: downscale images and split large documents before sending.
  • Runaway message history: in a chat loop, prune or summarize earlier messages instead of appending forever.

If you're regularly bumping into size limits, that's a design signal — move to retrieval (send only relevant chunks) rather than dumping everything into every call.

How do I handle timeouts and connection errors?

Timeouts (408), dropped connections, and read timeouts are transient network problems, not API rejections. They're retryable, and they're especially common on long generations where the response takes a while to stream back.

Two things matter here. First, set a sane client timeout — too short and you'll abort valid long responses; too long and a stuck connection hangs your worker. Second, prefer streaming for long outputs: it starts returning tokens immediately, so you're far less likely to trip an idle timeout waiting for a complete response.

import anthropic

# Generous timeout for long generations; stream to avoid idle timeouts
client = anthropic.Anthropic(timeout=120.0)

with client.messages.stream(
    model="claude-sonnet-4.5",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Write a long report."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Connection errors should fall into the same retry bucket as 529/500/503. The implementation below catches them.

The complete status code reference

Keep this table next to your error handler. The "Action" column is the whole game — retry the transient stuff, fix the rest.

CodeMeaningRetryable?Action
400Bad request (malformed payload)NoFix the request: max_tokens, JSON, role order
401Authentication failedNoCheck the API key and headers
403Forbidden (no permission)NoCheck model/endpoint access
408Request timeoutYesRetry; raise client timeout; stream
413Request too largeNoShrink context, images, history
429Rate limit exceededYesBackoff + honor retry-after
500Internal server errorYesBackoff and retry
503Service unavailableYesBackoff and retry
529OverloadedYesBackoff with jitter; reduce concurrency

Exponential backoff with jitter: the production retry

This is the one piece of code every Claude integration needs. It retries the transient errors (429, 500, 503, 529, timeouts, connection drops), honors retry-after when present, adds jitter so clients don't synchronize, and — critically — does not retry the errors that are your bug (400, 401, 403, 413).

Python:

import anthropic
import time
import random

RETRYABLE_STATUS = {408, 429, 500, 503, 529}

def call_with_retry(
    client: anthropic.Anthropic,
    *,
    max_retries: int = 6,
    base_delay: float = 1.0,
    max_delay: float = 60.0,
    **kwargs,
) -> anthropic.types.Message:
    for attempt in range(max_retries):
        try:
            return client.messages.create(**kwargs)
        except (anthropic.APIConnectionError, anthropic.APITimeoutError) as e:
            if attempt == max_retries - 1:
                raise
            _sleep(base_delay, max_delay, attempt, None)
        except anthropic.APIStatusError as e:
            status = e.status_code
            # Don't retry client errors you caused — fix the request instead.
            if status not in RETRYABLE_STATUS or attempt == max_retries - 1:
                raise
            retry_after = e.response.headers.get("retry-after") if e.response else None
            _sleep(base_delay, max_delay, attempt, retry_after)
    raise RuntimeError("unreachable")

def _sleep(base: float, cap: float, attempt: int, retry_after) -> None:
    if retry_after:
        delay = float(retry_after)
    else:
        delay = min(cap, base * (2 ** attempt))
    # Full jitter: pick a random point in [0, delay].
    time.sleep(random.uniform(0, delay))

TypeScript:

import Anthropic from "@anthropic-ai/sdk";

const RETRYABLE = new Set([408, 429, 500, 503, 529]);
const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));

export async function callWithRetry(
  client: Anthropic,
  params: Anthropic.MessageCreateParamsNonStreaming,
  { maxRetries = 6, baseDelay = 1000, maxDelay = 60000 } = {},
): Promise<Anthropic.Message> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.messages.create(params);
    } catch (err) {
      const last = attempt === maxRetries - 1;

      if (err instanceof Anthropic.APIConnectionError) {
        if (last) throw err;
      } else if (err instanceof Anthropic.APIError && err.status) {
        // 400/401/403/413 are your bug — surface them immediately.
        if (!RETRYABLE.has(err.status) || last) throw err;
      } else {
        throw err;
      }

      const retryAfter = (err as Anthropic.APIError)?.headers?.["retry-after"];
      const base = retryAfter
        ? parseFloat(retryAfter) * 1000
        : Math.min(maxDelay, baseDelay * 2 ** attempt);
      await sleep(Math.random() * base); // full jitter
    }
  }
  throw new Error("unreachable");
}

A few design choices worth understanding. Full jitter (a random value in [0, delay]) beats fixed exponential delays at preventing synchronized retries — the AWS architecture blog made this case years ago and it still holds. The max_delay cap stops your backoff from ballooning to absurd waits. And separating retryable from non-retryable status codes is what keeps a 400 from spinning forever.

How does a gateway cut down 529 errors?

A gateway sits between your app and Claude, and the good ones run provider routing with automatic fallback — so when one backend returns 529 Overloaded, the request is retried against alternate capacity instead of failing. You get fewer overload errors without writing a multi-provider failover layer yourself.

Claudexia works this way. It's a gateway to Claude — Opus, Sonnet, Haiku — with no Anthropic account and no VPN required. Because it maintains separate, pooled capacity and routes around congested backends, transient 529s that would otherwise hit your users get absorbed upstream. You still run your own backoff (always do), but the gateway shrinks how often it fires.

Switching is a one-line change — same SDKs, same request shapes:

import anthropic

# Same Anthropic SDK, pointed at the gateway.
client = anthropic.Anthropic(
    api_key="sk_cdx_your_key",
    base_url="https://api.claudexia.tech",
)

response = client.messages.create(
    model="claude-sonnet-4.5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: "sk_cdx_your_key",
  baseURL: "https://api.claudexia.tech",
});

There's also an OpenAI-compatible endpoint, so tools like Cursor, Claude Code, and OpenCode plug in unchanged. Billing is pay-per-token with no subscription — Sonnet and Haiku from $0.33/1M tokens, Opus from $0.50/1M, tracking Anthropic's rates — and you can pay via СБП, Russian cards, crypto (USDT/BTC/ETH), or MTS. See pricing and the API reference for details, or accessing Claude from Russia in 2026 for the regional angle.

FAQ

Is the 529 Overloaded error my fault?

No. A 529 reflects load on Anthropic's platform, not a problem with your key, quota, or request. The fix is to retry with exponential backoff and jitter, reduce your concurrency during the spike, and ideally route through a gateway that fails over to separate capacity. It's the most common transient error people hit on the direct API.

Should I retry a 400 Bad Request error?

Never. A 400 means the payload is malformed, so an identical retry fails identically — you'll just waste calls. Read error.message, which usually names the broken field, and fix it. The usual culprits are a missing max_tokens, invalid JSON, or messages that don't alternate user/assistant starting with user.

What's the difference between 429 and 529?

A 429 is your rate limit — you've exceeded RPM, TPM, or a daily budget, and the response carries a retry-after telling you how long to wait. A 529 is Anthropic's overload — aggregate platform demand, usually with no retry-after. Both are retryable, but 429 is solved by slowing down or raising limits, while 529 is solved by backoff and failover. For limits depth, see the rate limits guide.

How many times should I retry before giving up?

For transient errors, 5–6 attempts with capped exponential backoff is a sensible default — enough to ride out a brief spike without hanging a request for minutes. Always cap the per-attempt delay (e.g. 60 seconds) and surface a clear error to the caller once retries are exhausted, so a stuck request doesn't masquerade as success.

Why do I keep getting 401 even though my key looks right?

Usual suspects: an environment variable that loaded as an empty string, trailing whitespace or a newline pasted into the key, the wrong header (x-api-key vs Authorization: Bearer), or a key/base-URL mismatch — an sk-ant-… key won't authenticate against a non-Anthropic base URL, and an sk_cdx_… key must target https://api.claudexia.tech. Log the exact key length and prefix (never the full value) to spot these fast.


Errors are inevitable with any networked API. What separates a flaky integration from a solid one is a retry layer that knows the difference between "wait and try again" and "fix your code." Ship the backoff helper above, keep the status-code table handy, and let a gateway soak up the overloads. Ready to start? The quickstart gets you a key and a working request in a few minutes — and if you get stuck, reach out or ping us on Telegram.