Claude loses its >99% uptime in Q1 2026

[−] palcu 49d ago

Hey folks, I'm Alex from the reliability engineering team at Anthropic. We've just posted the retrospective for this incident:

> On March 26–27, 2026, customers experienced elevated error rates when using Claude Opus 4.6 and Claude Sonnet 4.6. The issue was caused by a networking performance degradation within our cloud infrastructure that disrupted communication between components of our serving stack. We resolved the incident by migrating the affected workloads to healthy infrastructure, restoring normal service by 9:30 AM PT on March 27.

https://status.claude.com/incidents/b9802k1zb5l2

[−] halJordan 49d ago

Is it really an answer to say "network disruption" with a bunch of $10 words? Certainly it doesn't belong here of all places.

[−] nerdsniper 49d ago

It’s definitely an answer! Maybe just not a “retrospective”?

[−] cedws 49d ago

Are you able to share if there's a general trend behind the outages? Do you often hit capacity, or do you budget to have headroom?

[−] palcu 49d ago

Yes, the general trend is the unprecedented growth that we've seen. Typically one would have some time in advance to re-engineer the systems to support the increased in traffic and users. But we're dealing with very compressed timelines and while most of the time we're able to fix the issues beforehand, sometimes we have to do them in production. Sorry for that.

[−] yread 49d ago

At this point you can stop worrying about downtime-free deployments so the devops becomes easier

[−] michaelcampbell 50d ago

> Our uptime has a '9' in it! -- Anthropic

[−] adgjlsfhk1 49d ago

Github this month is very close to having 0 9s reliability. (unless they want to argue that 89% has a 9 in it)

[−] littlestymaar 49d ago

I'm not sure I've had a day without Github hiccups this month, so that feels right.

[−] marcosdumay 49d ago

The comment you are replying is carefully written in a way that allows 23.19%

[−] claw-el 49d ago

There is always 88.9% or 88.89%

[−] ACCount37 49d ago

By now, I'm nearly certain that they'd be down to 0 9s of uptime if they counted it conservatively.

[−] leosanchez 49d ago

Or as the British would say "9 innit ?"

[−] bwb 50d ago

We had a ton of traffic coming in to check them: https://downforeveryoneorjustme.com/anthropic

Not one of the usual ones that has service problems :)

[−] timpera 50d ago

https://status.claude.com/

[−] verdverm 50d ago

You can access Claude models with Google Cloud reliability via VertexAI. The caveat is that you cannot use your subscription, per-token pricing only.

I personally prefer per-token, it makes you more thoughtful about your setup and usage, instead of spray and pray.

You can also access the notable open weight models with VertexAI, only need to change the model id string.

[−] steveBK123 50d ago

Remember when putting your entire life & business into the cloud was good because they were all offering 5 9s of uptime?

Very few cases these days.. feels like we are lucky to get 2 9s anymore.

[−] dehrmann 49d ago

I wonder how much is due to supply constraints, how much is standard growing pains, and if over-reliance on AI was the cause for any outages.

[−] yomismoaqui 49d ago

Maybe they are gunning for 5 nines (9.9999%)

[−] rambojohnson 49d ago

It's pretty damn good, and it's seen a real exodus of conscientious users; the QuitGPT movement alone hit 1.5 million participants, with Claude skyrocketing to #1 on the App Store virtually overnight. No surprise the servers are getting hammered.

time to give your devops guy his job back.

[−] sgbeal 49d ago

The ironic thing about outages such as this one and Github's recent spate of outages are that if those vendors' sales pitches are to be believed, the vendors could just ask their LLMs to program reliable replacements overnight (okay, maybe a weekend).

[−] Trufa 50d ago

I honestly feel like it's more honest status measure than many status pages I know.

[−] scuff3d 49d ago

Probably vide-coded their infrastructure

[−] seneca 50d ago

They seem to be a victim of their own success. Their response times are quite bad, and it's widely believed they are doing something to degrade service quality (quantizing?) in order to stretch resources. They just announced that they're cutting their usage limits down during peak hours as well.

They're in serious risk of losing their lead with this sort of performance.

[−] aubanel 49d ago

I wouldn't be too harsh, scaling x10 YoY is a bit hard on the infra!

[−] littlestymaar 49d ago

If you don't pay attention 99% may sound high but it means up to 20 hours of downtime in over the quarter.

Anthropic has had more than that.

Yikes.

[−] claudiug 49d ago

MAKE NO MISTAKES! DO NOT HALLUCINATE! FIX IT!

[−] 3yr-i-frew-up 49d ago

Victim of success.

They are the best.

ChatGPT is walmart.

Gemini is kroger.

Claude is... idk your local grocer that is always amazing and costs more?

[−] rvz 50d ago

This is not an outage, Claude just gets lazier on Fridays.

Sometimes Claude wants more lunch breaks, takes a half day and leaves the desk early just like any human would. (since AI boosters like comparing LLMs to humans all the time) /s

[−] boxingdog 49d ago

[dead]

[−] mastabadtomm 50d ago

[dead]

Claude loses its >99% uptime in Q1 2026 (bsky.app)

87 comments