Agentic Patterns with Claude in 2026: ReAct, Plan-Execute, Reflexion

ReAct, Plan-and-Execute, Reflexion, and Tree-of-Thoughts — which agent pattern actually works for Claude Sonnet and Opus in production. With code.

By 2026, "build an agent" is no longer a research project. The Anthropic SDK ships first-class tool use, every serious team has at least one production loop running on Claude Sonnet or Opus, and the question has shifted from can we to which pattern. The honest answer: most teams over-engineer this. They reach for Tree-of-Thoughts when a 30-line ReAct loop would have shipped on Tuesday.

This post walks through the four patterns that matter — ReAct, Plan-and-Execute, Reflexion, and Tree-of-Thoughts — with runnable code against https://api.claudexia.tech/v1, notes on when to escalate from Sonnet to Opus mid-loop, and the observability practices that keep these things debuggable in production.

The landscape in 2026

The agent pattern zoo has consolidated. Five years of papers and frameworks settled into a small set of loops that genuinely move the needle:

ReAct — interleave reasoning and tool calls. The default.
Plan-and-Execute — generate a full plan up front, then execute steps.
Reflexion — let the model critique its own output and retry.
Tree-of-Thoughts — explore multiple reasoning branches in parallel and vote.

Everything else (CoT, ReWOO, LATS, AutoGPT-style infinite loops) either collapsed into one of these or quietly died. Claude Sonnet 4.6 is strong enough that elaborate scaffolding hurts more than it helps. The base model already plans, already self-corrects, already knows when to call a tool. Your job is to give it the right loop, not to build a brain.

ReAct: the default sweet spot

ReAct is the loop you should reach for first. Think, call a tool, observe the result, think again, repeat until done. Anthropic's tool use API is built for exactly this rhythm — you don't even need a framework.

from anthropic import Anthropic

client = Anthropic(base_url="https://api.claudexia.tech/v1")

tools = [
    {
        "name": "search_docs",
        "description": "Search internal documentation",
        "input_schema": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"],
        },
    },
    {
        "name": "run_sql",
        "description": "Run a read-only SQL query",
        "input_schema": {
            "type": "object",
            "properties": {"sql": {"type": "string"}},
            "required": ["sql"],
        },
    },
]

def react_loop(user_msg: str, max_turns: int = 10):
    messages = [{"role": "user", "content": user_msg}]
    for turn in range(max_turns):
        resp = client.messages.create(
            model="claude-sonnet-4.6",
            max_tokens=4096,
            tools=tools,
            messages=messages,
        )
        messages.append({"role": "assistant", "content": resp.content})

        if resp.stop_reason == "end_turn":
            return resp

        tool_results = []
        for block in resp.content:
            if block.type == "tool_use":
                result = dispatch_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result,
                })
        messages.append({"role": "user", "content": tool_results})

    raise RuntimeError("max turns exceeded")

That's the entire pattern. No graph, no state machine, no orchestrator. The conversation history is the state. Sonnet handles five-to-fifteen-step workflows on this loop without breaking a sweat.

When ReAct works:

The next action depends on the previous observation (most real workflows).
Tools are cheap and fast (sub-second latency each).
You don't know up front how many steps it will take.
You want streaming output to the user.

When ReAct hurts:

Steps are independent and could run in parallel.
Each tool call costs real money or wall time.
You need a plan reviewed by a human before execution.

Plan-and-Execute: when steps are independent

Plan-and-Execute splits the problem in two. First call: produce a numbered plan. Second pass: execute each step, often in parallel, often with a cheaper model. The planner is Opus or Sonnet; the executors can be Haiku.

def plan_and_execute(goal: str):
    plan_resp = client.messages.create(
        model="claude-opus-4.7",
        max_tokens=2048,
        messages=[{
            "role": "user",
            "content": f"Produce a JSON plan to accomplish: {goal}\n"
                       f"Format: {{\"steps\": [{{\"id\": 1, \"action\": \"...\", "
                       f"\"depends_on\": []}}]}}",
        }],
    )
    plan = json.loads(extract_json(plan_resp.content[0].text))

    results = {}
    for step in topological_sort(plan["steps"]):
        deps = {d: results[d] for d in step["depends_on"]}
        results[step["id"]] = execute_step(step["action"], deps)
    return results

def execute_step(action: str, context: dict) -> str:
    resp = client.messages.create(
        model="claude-haiku-4.5",
        max_tokens=1024,
        tools=tools,
        messages=[{
            "role": "user",
            "content": f"Context: {json.dumps(context)}\nDo: {action}",
        }],
    )
    return resp.content[0].text

The economic argument is real. A plan from Opus costs maybe $0.05; ten Haiku executions cost pennies. Compared to running the entire workflow on Opus end-to-end, you're looking at 5–10x cost reduction with comparable quality on well-decomposed tasks.

The catch: if step 3 reveals that step 2's assumption was wrong, a static plan can't recover. Pure Plan-and-Execute is brittle on workflows where the world bites back. The fix is replanning: after each step, ask the planner whether the remaining plan still makes sense. That hybrid — sometimes called Plan-Execute-Replan — is what actually ships.

Use Plan-and-Execute when:

The work decomposes cleanly into independent steps.
Steps can run in parallel and you want the speedup.
A human or audit trail needs to see the plan before execution.
Most steps are simple enough for Haiku.

Reflexion: quality at 3x the tokens

Reflexion adds a self-critique pass. The agent attempts the task, then a second prompt asks it to evaluate its own answer against the original goal, identify failures, and retry with that critique in context. It works — and it's expensive.

def reflexion(task: str, max_attempts: int = 3):
    history = []
    for attempt in range(max_attempts):
        attempt_resp = client.messages.create(
            model="claude-sonnet-4.6",
            max_tokens=4096,
            messages=[{
                "role": "user",
                "content": f"Task: {task}\n"
                           f"Previous attempts and critiques:\n{format_history(history)}",
            }],
        )
        answer = attempt_resp.content[0].text

        critique_resp = client.messages.create(
            model="claude-opus-4.7",
            max_tokens=2048,
            messages=[{
                "role": "user",
                "content": f"Task: {task}\nProposed answer:\n{answer}\n\n"
                           f"Is this correct and complete? If not, identify "
                           f"specific failures. Reply with PASS or detailed critique.",
            }],
        )
        critique = critique_resp.content[0].text

        if critique.strip().startswith("PASS"):
            return answer
        history.append({"answer": answer, "critique": critique})
    return answer

The pattern roughly triples token cost — every successful task pays for an attempt and a critique, every retry adds another round trip. On open-ended generation tasks (writing, code, analysis), Reflexion measurably improves quality. On clear-cut tool-use tasks, it adds latency for no gain.

Save Reflexion for safety-critical workflows. Generating a customer-facing legal summary? Worth the 3x. Filing a Jira ticket? Use ReAct.

Tree-of-Thoughts: rarely worth it in 2026

Tree-of-Thoughts (ToT) explores multiple reasoning branches in parallel, scores them, and picks a winner. In 2023 with GPT-4 and Claude 2, this added real signal on hard reasoning. In 2026 with Sonnet 4.7 and Opus 4.7, the base model's single-shot reasoning is good enough that the parallel-branch overhead almost never pays off.

I still see ToT in two niches:

Adversarial search (game playing, puzzle solving) where you genuinely need lookahead.
Generative design where you want diverse candidates, not the single best answer.

For business workflows, RAG, and tool use? Skip it. The compute spent on three parallel reasoning trees is better spent on Reflexion or just upgrading to Opus.

Escalating from Sonnet to Opus mid-loop

A pattern that quietly became standard in 2025: start the loop on Sonnet, and escalate to Opus only when Sonnet signals uncertainty. You detect this either by parsing the model's own hedging language ("I'm not sure", "this is ambiguous") or by checking whether tool calls are going in circles.

def adaptive_react(user_msg: str):
    model = "claude-sonnet-4.6"
    messages = [{"role": "user", "content": user_msg}]
    consecutive_tool_failures = 0

    for turn in range(15):
        resp = client.messages.create(model=model, max_tokens=4096, tools=tools, messages=messages)
        messages.append({"role": "assistant", "content": resp.content})

        if resp.stop_reason == "end_turn":
            return resp

        if detect_uncertainty(resp.content) or consecutive_tool_failures >= 2:
            model = "claude-opus-4.7"

        # ... handle tool_use blocks, update consecutive_tool_failures ...

The cost math works out. Most turns run on cheap Sonnet; you only pay Opus prices on the hard ones. For mixed-difficulty workloads, this routinely cuts spend by 60–70% versus running everything on Opus, with no measurable quality loss on the easy turns.

Observability: log everything

The single biggest mistake I see in production agent systems is opaque logging. Teams log "agent ran for 8 turns and finished" and then have no idea why a regression appeared.

The minimum viable observability for an agent loop:

Every tool_use block: tool name, input arguments, timestamp, turn number.
Every tool_result block: tool_use_id, output (truncated if huge), latency, success/failure.
Every model response: model name, input tokens, output tokens, stop_reason.
The full message history at the end of the loop, persisted for at least 30 days.

OpenTelemetry has reasonable conventions for this now (spans for each turn, attributes for tool names and token counts). Whatever you use, the test is: can someone six weeks from now figure out why the agent took the action it took on Tuesday at 3pm? If not, you're flying blind.

A small but underrated practice: log the prompt that produced each response, not just the response. When prompts drift across deploys, you want to be able to diff them.

Bottom line

After two years of running these patterns in production:

Default to ReAct. It's the simplest loop that handles 80% of real work. Build it first.
Reach for Plan-and-Execute when steps are independent and parallelizable, or when a human needs to review the plan.
Add Reflexion only on safety-critical or quality-critical outputs. The 3x token cost is real; pay it deliberately.
Skip Tree-of-Thoughts unless you're doing adversarial search or generative design. Modern Claude is too capable for the overhead to pay off on normal workloads.
Escalate Sonnet → Opus mid-loop for cost efficiency. Most turns don't need Opus.
Log every tool_use and tool_result. Future you will thank present you.

The loops are simple. The discipline is in not building more loop than you need.