Why Anthropic 529 Errors Skip Your Fallback Models (And How to Fix It)

When Anthropic's API returns a 529 "overloaded" error, you'd expect OpenClaw's model fallback system to kick in and try your secondary models. Instead, users are seeing errors bubble up directly — even when they have perfectly good fallbacks configured.

A recent GitHub issue (#28502) uncovered exactly why this happens, and the root cause reveals something important about how OpenClaw's retry architecture actually works.

The Two-Loop Architecture

OpenClaw has two distinct retry mechanisms working together:

Inner fallback loop (runWithModelFallback) — cycles through your configured models (primary → secondary → tertiary) when one fails
Outer retry loop (agent-runner-execution) — catches transient errors and retries the entire fallback chain after a delay

The problem? HTTP 529 is recognized by the outer loop but not by the inner loop.

What Actually Happens

When Anthropic returns 529:

The inner fallback loop sees an error it doesn't recognize as "fallback-worthy"
It throws the error up to the outer loop instead of trying your secondary models
The outer loop catches it (529 is in TRANSIENT_HTTP_ERROR_CODES), waits, then retries
The retry starts the whole primary→fallback chain again
If Anthropic is still overloaded, you've burned two attempts on Claude without ever trying your fallback

The outer loop's comment even explains the reasoning: "transient errors typically affect the whole provider, so falling back to an alternate model first would not help."

But this assumption breaks down when:

Your fallback is a different provider (OpenAI, local Ollama)
You're using proxy providers where 529 might be endpoint-specific
You'd rather get any response than wait and retry the same failing model

The Fix

The proposed solution is elegantly simple — add 529 to resolveFailoverReasonFromError in failover-error.ts:

if (status === 529) {
  return "timeout";
}

This lets the inner fallback loop try your secondary models before the outer loop ever kicks in. Best of both worlds: fallbacks get attempted first, and if all models return 529, the outer retry loop still provides a second chance after a delay.

What You Can Do Now

Until this is merged:

Order your fallbacks strategically — put different providers first in your fallback chain
Monitor your fallback usage — if you're seeing 529 errors with untouched fallbacks, this is why
Consider provider diversity — mixing Claude, GPT, and local models gives you resilience against single-provider outages

This is a great example of how understanding OpenClaw's internals helps you build more robust agent configurations. The retry architecture is sophisticated, but knowing where the gaps are lets you work around them.

Track the fix: GitHub #28502

Why Anthropic 529 Errors Skip Your Fallback Models (And How to Fix It)

The Two-Loop Architecture

What Actually Happens

The Fix

What You Can Do Now

Comments (0)

You might also like

Security Alert: Prompt Injection via Fake [System Message] Blocks in Message Channels

Feature Request: hooks.sessionRetention Brings Automatic Cleanup to Webhook-Triggered Sessions

Feature Request: Native GitHub Channel Would Let Your Agent Work Alongside You on Pull Requests