Claude loses its >99% uptime in Q1 2026 (bsky.app)

by timpera 87 comments 92 points
Read article View on HN

87 comments

[−] palcu 49d ago
Hey folks, I'm Alex from the reliability engineering team at Anthropic. We've just posted the retrospective for this incident:

> On March 26–27, 2026, customers experienced elevated error rates when using Claude Opus 4.6 and Claude Sonnet 4.6. The issue was caused by a networking performance degradation within our cloud infrastructure that disrupted communication between components of our serving stack. We resolved the incident by migrating the affected workloads to healthy infrastructure, restoring normal service by 9:30 AM PT on March 27.

https://status.claude.com/incidents/b9802k1zb5l2

[−] halJordan 49d ago
Is it really an answer to say "network disruption" with a bunch of $10 words? Certainly it doesn't belong here of all places.
[−] nerdsniper 49d ago
It’s definitely an answer! Maybe just not a “retrospective”?
[−] cedws 49d ago
Are you able to share if there's a general trend behind the outages? Do you often hit capacity, or do you budget to have headroom?
[−] palcu 49d ago
Yes, the general trend is the unprecedented growth that we've seen. Typically one would have some time in advance to re-engineer the systems to support the increased in traffic and users. But we're dealing with very compressed timelines and while most of the time we're able to fix the issues beforehand, sometimes we have to do them in production. Sorry for that.
[−] yread 50d ago
At this point you can stop worrying about downtime-free deployments so the devops becomes easier
[−] michaelcampbell 50d ago

> Our uptime has a '9' in it! -- Anthropic

[−] adgjlsfhk1 50d ago
Github this month is very close to having 0 9s reliability. (unless they want to argue that 89% has a 9 in it)
[−] littlestymaar 50d ago
I'm not sure I've had a day without Github hiccups this month, so that feels right.
[−] marcosdumay 50d ago
The comment you are replying is carefully written in a way that allows 23.19%
[−] claw-el 49d ago
There is always 88.9% or 88.89%
[−] ACCount37 50d ago
By now, I'm nearly certain that they'd be down to 0 9s of uptime if they counted it conservatively.
[−] leosanchez 50d ago
Or as the British would say "9 innit ?"
[−] bwb 50d ago
We had a ton of traffic coming in to check them: https://downforeveryoneorjustme.com/anthropic

Not one of the usual ones that has service problems :)

[−] timpera 50d ago
[−] verdverm 50d ago
You can access Claude models with Google Cloud reliability via VertexAI. The caveat is that you cannot use your subscription, per-token pricing only.

I personally prefer per-token, it makes you more thoughtful about your setup and usage, instead of spray and pray.

You can also access the notable open weight models with VertexAI, only need to change the model id string.

[−] steveBK123 50d ago
Remember when putting your entire life & business into the cloud was good because they were all offering 5 9s of uptime?

Very few cases these days.. feels like we are lucky to get 2 9s anymore.

[−] dehrmann 50d ago
I wonder how much is due to supply constraints, how much is standard growing pains, and if over-reliance on AI was the cause for any outages.
[−] yomismoaqui 49d ago
Maybe they are gunning for 5 nines (9.9999%)
[−] rambojohnson 49d ago
It's pretty damn good, and it's seen a real exodus of conscientious users; the QuitGPT movement alone hit 1.5 million participants, with Claude skyrocketing to #1 on the App Store virtually overnight. No surprise the servers are getting hammered.

time to give your devops guy his job back.

[−] sgbeal 49d ago
The ironic thing about outages such as this one and Github's recent spate of outages are that if those vendors' sales pitches are to be believed, the vendors could just ask their LLMs to program reliable replacements overnight (okay, maybe a weekend).
[−] Trufa 50d ago
I honestly feel like it's more honest status measure than many status pages I know.
[−] scuff3d 50d ago
Probably vide-coded their infrastructure
[−] seneca 50d ago
They seem to be a victim of their own success. Their response times are quite bad, and it's widely believed they are doing something to degrade service quality (quantizing?) in order to stretch resources. They just announced that they're cutting their usage limits down during peak hours as well.

They're in serious risk of losing their lead with this sort of performance.

[−] aubanel 50d ago
I wouldn't be too harsh, scaling x10 YoY is a bit hard on the infra!
[−] littlestymaar 50d ago
If you don't pay attention 99% may sound high but it means up to 20 hours of downtime in over the quarter.

Anthropic has had more than that.

Yikes.

[−] claudiug 50d ago
MAKE NO MISTAKES! DO NOT HALLUCINATE! FIX IT!
[−] 3yr-i-frew-up 50d ago
Victim of success.

They are the best.

ChatGPT is walmart.

Gemini is kroger.

Claude is... idk your local grocer that is always amazing and costs more?

[−] rvz 50d ago
This is not an outage, Claude just gets lazier on Fridays.

Sometimes Claude wants more lunch breaks, takes a half day and leaves the desk early just like any human would. (since AI boosters like comparing LLMs to humans all the time) /s

[−] boxingdog 49d ago
[dead]
[−] mastabadtomm 50d ago
[dead]