Streaming Claude API Responses with SSE in 2026: TypeScript and Python

Server-Sent Events streaming halves perceived latency for Claude API. Here is how to consume the stream correctly in TypeScript and Python via Claudexia.

If your Claude-powered product feels sluggish, the problem is almost never total throughput — it is time to first token (TTFT). A non-streaming completion that takes 8 seconds to return 600 tokens feels broken. The exact same 600 tokens, streamed, feels instant because the first words land in under 400 ms and the rest scrolls in like a person typing. Streaming does not make the model faster; it makes the perceived latency roughly half of total latency, which is the metric your users actually feel.

This guide covers how Claude streams responses over Server-Sent Events (SSE), the Anthropic event taxonomy you must handle, and production patterns in TypeScript (Next.js Edge runtime) and Python (httpx async). All examples target Claudexia's Anthropic-compatible gateway at https://api.claudexia.tech/v1, which is a drop-in for api.anthropic.com/v1.

TTFT vs total latency

When you benchmark a Claude call, record two numbers:

TTFT — milliseconds until the first content delta arrives.
Total — milliseconds until message_stop.

For a Sonnet call producing ~800 output tokens, TTFT is typically 300–600 ms while total is 4–8 seconds. If you are not streaming, your users wait the full 8 seconds staring at a spinner. If you are streaming, they see text in 400 ms and read along as it generates. The total tokens-per-second (TPS) is identical either way; you are buying perception, not throughput.

The SSE wire format

Server-Sent Events is a one-way HTTP streaming protocol. The response has Content-Type: text/event-stream and the body is a sequence of records separated by blank lines. Each record looks like:

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

Two rules trip up everyone:

Records are terminated by two newlines (\n\n), not one.
A single logical event may contain multiple data: lines, which are joined with \n by the consumer before JSON-parsing.

Most SDKs hide this for you, but the moment you parse SSE by hand — inside an Edge Worker, in Go, or for debugging — you must respect it.

Anthropic event types

Claude's stream emits a small, well-defined set of events. Handle each one explicitly; do not assume order beyond what the spec guarantees.

message_start — initial message envelope with id, model, and a zeroed usage block. Capture the message id here for logging.
content_block_start — a new content block begins. Blocks can be text, tool_use, thinking, or redacted_thinking. Index matters if the model emits multiple blocks.
content_block_delta — incremental payload. For text it carries text_delta; for tool calls it carries input_json_delta (partial JSON fragments you must concatenate before parsing).
content_block_stop — the block is finished.
message_delta — top-level message updates, most importantly stop_reason (end_turn, max_tokens, tool_use, stop_sequence) and final usage counters.
message_stop — terminal event. Close your reader.
ping — keep-alive sent every ~15 seconds. Ignore the payload but do not close the stream; pings exist precisely so reverse proxies do not kill an idle connection.
error — yes, errors can arrive mid-stream as a regular SSE event (overloaded model, content policy stop, upstream timeout). Your handler must treat error-as-event the same as a thrown error.

TypeScript: Next.js Edge Route Handler

The cleanest pattern for a Next.js app is an Edge Route Handler that proxies the Claude stream straight to the browser. Edge runtime gives you native ReadableStream plumbing and no cold-start tax.

// app/api/chat/route.ts
import Anthropic from "@anthropic-ai/sdk";

export const runtime = "edge";

const client = new Anthropic({
  apiKey: process.env.CLAUDEXIA_API_KEY!,
  baseURL: "https://api.claudexia.tech/v1",
});

export async function POST(req: Request) {
  const { messages } = await req.json();
  const controller = new AbortController();
  req.signal.addEventListener("abort", () => controller.abort());

  const stream = await client.messages.stream(
    {
      model: "claude-sonnet-4.6",
      max_tokens: 1024,
      messages,
    },
    { signal: controller.signal },
  );

  const encoder = new TextEncoder();
  const body = new ReadableStream({
    async start(ctrl) {
      try {
        for await (const event of stream) {
          if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
            ctrl.enqueue(encoder.encode(event.delta.text));
          }
        }
      } catch (err) {
        ctrl.enqueue(encoder.encode(`\n[error] ${(err as Error).message}`));
      } finally {
        ctrl.close();
      }
    },
    cancel() {
      controller.abort();
    },
  });

  return new Response(body, {
    headers: {
      "Content-Type": "text/plain; charset=utf-8",
      "Cache-Control": "no-cache, no-transform",
      "X-Accel-Buffering": "no",
    },
  });
}

Three details that bite teams in production:

X-Accel-Buffering: no disables nginx response buffering. Without it, your stream gets buffered into a single chunk and TTFT collapses back to the non-streaming case.
Cache-Control: no-transform prevents intermediaries from gzipping and re-chunking the response.
Wiring req.signal into an AbortController is what lets you cancel the upstream Claude call when the browser tab closes. Without it you keep paying for tokens nobody will read.

Python: httpx async streaming

For backend-only services, the official anthropic Python SDK already streams. When you need lower-level control — a custom proxy, a broadcast fan-out, or instrumentation — drop to httpx directly.

import json
import httpx

URL = "https://api.claudexia.tech/v1/messages"

async def stream_claude(prompt: str, api_key: str):
    headers = {
        "x-api-key": api_key,
        "anthropic-version": "2023-06-01",
        "content-type": "application/json",
        "accept": "text/event-stream",
    }
    payload = {
        "model": "claude-sonnet-4.6",
        "max_tokens": 1024,
        "stream": True,
        "messages": [{"role": "user", "content": prompt}],
    }

    async with httpx.AsyncClient(timeout=None) as client:
        async with client.stream("POST", URL, headers=headers, json=payload) as resp:
            resp.raise_for_status()
            event_name = None
            async for line in resp.aiter_lines():
                if not line:
                    event_name = None
                    continue
                if line.startswith("event: "):
                    event_name = line[7:].strip()
                elif line.startswith("data: "):
                    data = json.loads(line[6:])
                    if event_name == "content_block_delta":
                        delta = data.get("delta", {})
                        if delta.get("type") == "text_delta":
                            yield delta["text"]
                    elif event_name == "error":
                        raise RuntimeError(data.get("error", {}).get("message", "stream error"))
                    elif event_name == "message_stop":
                        return

Notes:

timeout=None on the client is mandatory. The default 5-second read timeout will kill any stream longer than five seconds.
aiter_lines() handles the \r\n vs \n normalisation for you.
The blank-line check resets event_name, which matters because SSE records are separated by an empty line and the next record may omit the event: field (defaulting to message).

Backpressure and cancellation

Claude can produce tokens faster than your downstream consumer can write them — to the browser, to a database, to another service. If you do not respect backpressure, the runtime queues bytes in memory and you get OOMs at scale.

In Node, ReadableStream with a default byteLength queuing strategy gives you backpressure for free. In Python, prefer async for over collecting into a list. For broadcast fan-out (one Claude stream → many WebSocket clients), use a bounded asyncio.Queue per client and drop the slow client, never the upstream.

Cancellation is the other half. When the user closes the tab, you must abort the upstream request:

TypeScript: forward request.signal into the SDK's signal option.
Python: wrap the body loop in try/finally and rely on httpx's context manager to close the connection — or call await resp.aclose() explicitly when an upstream client disconnects.

You pay for every token the model generates, even ones nobody reads. Cancellation is a cost-control feature, not just a UX nicety.

Common bugs to avoid

Forgetting to flush. Express, Fastify, and any custom Node framework will buffer writes by default. Set Content-Type: text/event-stream, Cache-Control: no-cache, X-Accel-Buffering: no, and call res.flushHeaders() immediately.
Missing pings → 60-second proxy timeout. Cloudflare, nginx, and most ALBs will close idle HTTP connections after 60 seconds. Claude's ping events keep the connection warm; if you filter them out and your model is thinking for a while before producing text, the proxy cuts you off. Either pass pings through, or emit your own keep-alive comments (: keepalive\n\n) every 15 seconds.
Concatenating tool-use JSON wrong. input_json_delta chunks are partial JSON fragments. You must accumulate the string and only JSON.parse once content_block_stop fires for that block.
Treating mid-stream errors as transport errors. An error event is a normal SSE record, not an HTTP failure. Your reader will not throw — you have to check the event type and raise yourself.
Logging deltas synchronously. console.log on every text_delta tanks throughput. Buffer logs and emit on message_stop.

Bottom line

Streaming is the single highest-leverage UX change you can make to a Claude integration. Use the SDK where you can, drop to raw SSE when you must, and remember: handle pings, abort on disconnect, and never trust the default proxy timeouts. Point your base_url at https://api.claudexia.tech/v1, keep your existing Anthropic SDK code, and ship.