Skip to content
STREAMING

Streaming Claude API Responses with SSE in 2026: TypeScript and Python

Server-Sent Events streaming halves perceived latency for Claude API. Here is how to consume the stream correctly in TypeScript and Python via Claudexia.

If your Claude-powered product feels sluggish, the problem is almost never total throughput — it is time to first token (TTFT). A non-streaming completion that takes 8 seconds to return 600 tokens feels broken. The exact same 600 tokens, streamed, feels instant because the first words land in under 400 ms and the rest scrolls in like a person typing. Streaming does not make the model faster; it makes the perceived latency roughly half of total latency, which is the metric your users actually feel.

This guide covers how Claude streams responses over Server-Sent Events (SSE), the Anthropic event taxonomy you must handle, and production patterns in TypeScript (Next.js Edge runtime) and Python (httpx async). All examples target Claudexia's Anthropic-compatible gateway at https://api.claudexia.tech/v1, which is a drop-in for api.anthropic.com/v1.

TTFT vs total latency

When you benchmark a Claude call, record two numbers:

  • TTFT — milliseconds until the first content delta arrives.
  • Total — milliseconds until message_stop.

For a Sonnet call producing ~800 output tokens, TTFT is typically 300–600 ms while total is 4–8 seconds. If you are not streaming, your users wait the full 8 seconds staring at a spinner. If you are streaming, they see text in 400 ms and read along as it generates. The total tokens-per-second (TPS) is identical either way; you are buying perception, not throughput.

The SSE wire format

Server-Sent Events is a one-way HTTP streaming protocol. The response has Content-Type: text/event-stream and the body is a sequence of records separated by blank lines. Each record looks like:

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

Two rules trip up everyone:

  1. Records are terminated by two newlines (\n\n), not one.
  2. A single logical event may contain multiple data: lines, which are joined with \n by the consumer before JSON-parsing.

Most SDKs hide this for you, but the moment you parse SSE by hand — inside an Edge Worker, in Go, or for debugging — you must respect it.

Anthropic event types

Claude's stream emits a small, well-defined set of events. Handle each one explicitly; do not assume order beyond what the spec guarantees.

  • message_start — initial message envelope with id, model, and a zeroed usage block. Capture the message id here for logging.
  • content_block_start — a new content block begins. Blocks can be text, tool_use, thinking, or redacted_thinking. Index matters if the model emits multiple blocks.
  • content_block_delta — incremental payload. For text it carries text_delta; for tool calls it carries input_json_delta (partial JSON fragments you must concatenate before parsing).
  • content_block_stop — the block is finished.
  • message_delta — top-level message updates, most importantly stop_reason (end_turn, max_tokens, tool_use, stop_sequence) and final usage counters.
  • message_stop — terminal event. Close your reader.
  • ping — keep-alive sent every ~15 seconds. Ignore the payload but do not close the stream; pings exist precisely so reverse proxies do not kill an idle connection.
  • error — yes, errors can arrive mid-stream as a regular SSE event (overloaded model, content policy stop, upstream timeout). Your handler must treat error-as-event the same as a thrown error.

TypeScript: Next.js Edge Route Handler

The cleanest pattern for a Next.js app is an Edge Route Handler that proxies the Claude stream straight to the browser. Edge runtime gives you native ReadableStream plumbing and no cold-start tax.

// app/api/chat/route.ts
import Anthropic from "@anthropic-ai/sdk";

export const runtime = "edge";

const client = new Anthropic({
  apiKey: process.env.CLAUDEXIA_API_KEY!,
  baseURL: "https://api.claudexia.tech/v1",
});

export async function POST(req: Request) {
  const { messages } = await req.json();
  const controller = new AbortController();
  req.signal.addEventListener("abort", () => controller.abort());

  const stream = await client.messages.stream(
    {
      model: "claude-sonnet-4.6",
      max_tokens: 1024,
      messages,
    },
    { signal: controller.signal },
  );

  const encoder = new TextEncoder();
  const body = new ReadableStream({
    async start(ctrl) {
      try {
        for await (const event of stream) {
          if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
            ctrl.enqueue(encoder.encode(event.delta.text));
          }
        }
      } catch (err) {
        ctrl.enqueue(encoder.encode(`\n[error] ${(err as Error).message}`));
      } finally {
        ctrl.close();
      }
    },
    cancel() {
      controller.abort();
    },
  });

  return new Response(body, {
    headers: {
      "Content-Type": "text/plain; charset=utf-8",
      "Cache-Control": "no-cache, no-transform",
      "X-Accel-Buffering": "no",
    },
  });
}

Three details that bite teams in production:

  • X-Accel-Buffering: no disables nginx response buffering. Without it, your stream gets buffered into a single chunk and TTFT collapses back to the non-streaming case.
  • Cache-Control: no-transform prevents intermediaries from gzipping and re-chunking the response.
  • Wiring req.signal into an AbortController is what lets you cancel the upstream Claude call when the browser tab closes. Without it you keep paying for tokens nobody will read.

Python: httpx async streaming

For backend-only services, the official anthropic Python SDK already streams. When you need lower-level control — a custom proxy, a broadcast fan-out, or instrumentation — drop to httpx directly.

import json
import httpx

URL = "https://api.claudexia.tech/v1/messages"

async def stream_claude(prompt: str, api_key: str):
    headers = {
        "x-api-key": api_key,
        "anthropic-version": "2023-06-01",
        "content-type": "application/json",
        "accept": "text/event-stream",
    }
    payload = {
        "model": "claude-sonnet-4.6",
        "max_tokens": 1024,
        "stream": True,
        "messages": [{"role": "user", "content": prompt}],
    }

    async with httpx.AsyncClient(timeout=None) as client:
        async with client.stream("POST", URL, headers=headers, json=payload) as resp:
            resp.raise_for_status()
            event_name = None
            async for line in resp.aiter_lines():
                if not line:
                    event_name = None
                    continue
                if line.startswith("event: "):
                    event_name = line[7:].strip()
                elif line.startswith("data: "):
                    data = json.loads(line[6:])
                    if event_name == "content_block_delta":
                        delta = data.get("delta", {})
                        if delta.get("type") == "text_delta":
                            yield delta["text"]
                    elif event_name == "error":
                        raise RuntimeError(data.get("error", {}).get("message", "stream error"))
                    elif event_name == "message_stop":
                        return

Notes:

  • timeout=None on the client is mandatory. The default 5-second read timeout will kill any stream longer than five seconds.
  • aiter_lines() handles the \r\n vs \n normalisation for you.
  • The blank-line check resets event_name, which matters because SSE records are separated by an empty line and the next record may omit the event: field (defaulting to message).

Backpressure and cancellation

Claude can produce tokens faster than your downstream consumer can write them — to the browser, to a database, to another service. If you do not respect backpressure, the runtime queues bytes in memory and you get OOMs at scale.

In Node, ReadableStream with a default byteLength queuing strategy gives you backpressure for free. In Python, prefer async for over collecting into a list. For broadcast fan-out (one Claude stream → many WebSocket clients), use a bounded asyncio.Queue per client and drop the slow client, never the upstream.

Cancellation is the other half. When the user closes the tab, you must abort the upstream request:

  • TypeScript: forward request.signal into the SDK's signal option.
  • Python: wrap the body loop in try/finally and rely on httpx's context manager to close the connection — or call await resp.aclose() explicitly when an upstream client disconnects.

You pay for every token the model generates, even ones nobody reads. Cancellation is a cost-control feature, not just a UX nicety.

Common bugs to avoid

  • Forgetting to flush. Express, Fastify, and any custom Node framework will buffer writes by default. Set Content-Type: text/event-stream, Cache-Control: no-cache, X-Accel-Buffering: no, and call res.flushHeaders() immediately.
  • Missing pings → 60-second proxy timeout. Cloudflare, nginx, and most ALBs will close idle HTTP connections after 60 seconds. Claude's ping events keep the connection warm; if you filter them out and your model is thinking for a while before producing text, the proxy cuts you off. Either pass pings through, or emit your own keep-alive comments (: keepalive\n\n) every 15 seconds.
  • Concatenating tool-use JSON wrong. input_json_delta chunks are partial JSON fragments. You must accumulate the string and only JSON.parse once content_block_stop fires for that block.
  • Treating mid-stream errors as transport errors. An error event is a normal SSE record, not an HTTP failure. Your reader will not throw — you have to check the event type and raise yourself.
  • Logging deltas synchronously. console.log on every text_delta tanks throughput. Buffer logs and emit on message_stop.

Bottom line

Streaming is the single highest-leverage UX change you can make to a Claude integration. Use the SDK where you can, drop to raw SSE when you must, and remember: handle pings, abort on disconnect, and never trust the default proxy timeouts. Point your base_url at https://api.claudexia.tech/v1, keep your existing Anthropic SDK code, and ship.