GitHub appears to be struggling with measly three nines availability (theregister.com)

by richtr 238 comments 474 points
Read article View on HN

238 comments

[−] 827a 54d ago
I don't want to give too much credit to Github, because their uptime is truly horrendous and they need to fix it. But: I've felt like its a little unfair to judge the uptime of company platforms like this; by saying "if any feature at all is down, its all down" and then translating that into 9s for the platform.

I never use Github Copilot; it does go down a lot, if their status page is to be believed; I don't really care when it goes down, because it going down doesn't bring down the rest of Github. I care about Github's uptime ignoring Copilot. Everyone's slice of what they care about is a little different, so the only correct way to speak on Github's uptime is to be precise and probably focus on a lot of the core stuff that tons of people care about and that's been struggling lately: Core git operations, website functionality, api access, actions, etc.

[−] dijksterhuis 54d ago

> I've felt like its a little unfair to judge the uptime of company platforms like this; by saying "if any feature at all is down, its all down" and then translating that into 9s for the platform.

This is definitely true.

At the same time, none of the individual services has hit 3x9 uptime in the last 90 days [0], which is their Enterprise SLA [1] ...

> "Uptime" is the percentage of total possible minutes the applicable GitHub service was available in a given calendar quarter. GitHub commits to maintain at least 99.9% Uptime for the applicable GitHub service.

[0]: https://mrshu.github.io/github-statuses/

[1]: https://github.com/customer-terms/github-online-services-sla

(may have edited to add links and stuff, can't remember, one of those days)

[−] windward 54d ago
So what happens for those enterprise customers now? Is there a meaningful fallout when these services fail to meet their SLAs?
[−] dijksterhuis 54d ago

> If GitHub does not meet the SLA, Customer will be entitled to service credit to Customer's account ("Service Credits") based on the calculation below ("Service Credits Calculation").

The linked document in my previous comment has more detail.

[−] Lalabadie 54d ago
It's worth adding that big (BIG!) business clients will usually negotiate the terms for going below the SLA threshold. The goal is less to be compensated if it happens, and more to incentivize the provider to never let it happen.
[−] drob518 54d ago
Right. Basically, they give you a coupon to lower your cost of future consumption. So, you have to keep consuming the service. If you just leave, you get no rebate. Obviously, very large customers get special deals.
[−] lucideer 54d ago
You're right that labelling any outage as "Github is down" is an overgeneralisation, & we should focus on bottlenecks that impact teams in a time sensitive matter, but that isn't the case here. Their most stable service (API) has only two 9s (99.69%).

They're not even struggling to get their average to three 9s, they're struggling to get ANY service to three 9s. They're struggling to get many services to two 9s.

Copilot may be the least stable at one 9, but the services I would consider most critical (Git & Actions) are also at one 9.

[−] ARandomerDude 54d ago
I love multiple 9s as much as the next guy but that's only 27 hours per year of downtime. For a mostly free (for me) service, I'm thankful.
[−] wavemode 54d ago
Most people complaining about uptime aren't free users or open-source developers. It's people whose companies are enterprise GitHub customers. It's a real problem and affects productivity.
[−] sefrost 54d ago
GitHub going down during office hours in a large enterprise has knock on effects for hours as well. Especially if you are in a monorepo.
[−] siren2026 53d ago
The issue is also that those 27 hours don't happen at once, They happen in small chunks of a couple minutes which makes it happen almost everyday and has a ton of downstream build and retry issues. The resulting downtime is probably 2 orders of magnitudes higher at least.
[−] skeeter2020 54d ago
I'm happy to report that my one-person sysops has successfully hit nine-fives for the 20th year in a row!
[−] malfist 54d ago
If there's only a 9 in availability, they've got a minimum downtime of 87.6 hours per year (98.99999999999999999%)
[−] gymbeaux 53d ago
Those 27 hours only seem to happen during the workday when I’m trying to push branches, run CI pipelines or otherwise use GitHub (I don’t use Copilot). Whatever the yearly figure, it’s been a pain in the ass these last few months and it’s unacceptable, free or no (my company pays for GitHub).
[−] lucideer 54d ago
Honestly, you're right - 2̶7̵ 87+ (correction from sibling) hours per year is absolutely fine & normal for me & anything I want to run. I personally think it should be fine for everybody.

On the other hand the baseline minimal Github Enterprise plan with no features (no Copilot, GHAS, etc.) runs a medium sized company $1m+ per annum, not including pay-per-use extras like CI minutes. As an individual I'm not the target audience for that invoice, but I can envisage whomever is wanting a couple of 9s to go with it. As a treat.

[−] maccard 54d ago
87 hours a year is 1.5 hours a week. If that 1.5 hour window is when you need to use it it matters a hell of a lot more than if it’s 4am on a Sunday.
[−] toast0 54d ago
Nine nines is too hard; my target is eight eights.
[−] calvinmorrison 54d ago
ONLY TWO NINES! Meanwhile vital government services here have a whopping 25% availability.
[−] lucideer 54d ago
Two things can be bad.
[−] bigfishrunning 54d ago
Lemme guess, those government services are run by the lowest bidder?
[−] gymbeaux 53d ago
Which services have 25% uptime?
[−] calvinmorrison 53d ago
The DMV.
[−] shimman 54d ago
This company is part of the portfolio of a $trillion+ transnational corporation. The idea that we can't judge them, when they clearly have more resources than 99% of other companies on this planet, doesn't hold up to any scrutiny.

Why defend a company that clearly doesn't care about its customers and see them as a money spigot to suck dry?

[−] thinkingtoilet 54d ago
The OP clearly never says we can't judge them. He was speaking to how the uptime is measured. I'm not saying I agree or disgree with the OP but at least address the argument he's making.
[−] collingreen 53d ago
You had an opportunity to use three or more nines in your comparison to other companies and you just left it on the table and went with two nines. ;p
[−] saxonww 54d ago
There's a completely reasonable comment by jamiemallers on this thread which is marked as 'dead' even after vouching. Not sure what's going on there.
[−] zahlman 54d ago
Presumably what's going on is https://news.ycombinator.com/item?id=47340079 . It's been quite an issue lately.
[−] masfuerte 54d ago
Take a look at his comment history.
[−] foobiekr 54d ago
It doesn't help that almost all of the big tech companies talking about 5 9s are lying about it; "Does it respond to the API at all, even with errors? It's up!" and so on. If you spend a lot of time analyzing browser traces you see errors and failures constantly from everyone, even huge companies that brag a lot about their prowess. But it's "up" even if a shard is completely down.

The five nines tech people usually are talking about is a fiction; the only place where the measure is really real is in networking, specifically service provider networking, otherwise it's often just various ways of cleverly slicing the data to keep the status screen green. A dead giveaway is a gander at the SLAs and all the ways the SLAs are basically worthless for almost everyone in the space.

See also all of the "1 hour response time" SLAs from open source wrapper companies. Yes, in one hour they will create a case and give you case ID. But that's not how they describe it.

[−] sumtechguy 54d ago
Thats the rub.

Once you dig into the details what does it mean to have 5 9s? Some systems have a huge surface area of calls and views. If the main web page is down but the entire backend API still is responding fine is that a 'down'? Well sorta. Or what if one misc API that some users only call during onboarding is down does that count? Well technically yes.

It depends on your users and what path they use and what is the general path.

Then add in response times to those down items. Those are usually made up too.

[−] jamiemallers 54d ago
[dead]
[−] embedding-shape 54d ago
From GitHub CTO in 2025 when they announced they're moving everything to Azure instead of letting GitHub's infrastructure remain independent:

> For us, availability is job #1, and this migration ensures GitHub remains the fast, reliable platform developers depend on

That went about as well as everyone thought back then.

Does anyone else remember back in ~2014-2015 sometime, when half the community was screaming at GitHub to "please be faster at adding more features"? I wish we could get back to platforms (or OSes for that matter) focusing in reliability and stability. Seems those days are long gone.

[−] __alexs 54d ago
GitHub have not really got much better at adding new features either though :(
[−] phyzome 54d ago
I don't know, it's nice that they finally broke native browser in-page search. That's a great feature for people who hate finding things.
[−] omnimus 54d ago
I work on lots of smaller client projects - usually named by the hostname. I absolutely don't understand how at some point the github search got so great it became unable to find my own repo by its name.

We have since switched to self hosted Forgejo instance. Unsurprisingly the search works.

[−] cozzyd 54d ago
Makes you actually read the code!
[−] BenjiWiebe 54d ago
Native browser in-page search is working for me, on Firefox. Is this a browser-specific change or is it a staged rollout coming my way soon?
[−] embedding-shape 54d ago
This was before Actions and a whole lot of other non-git related stuff. There was years (maybe even a decade?) where GitHub essentially was unchanged besides fixes and small incremental improvements, long time ago :)
[−] williamdclt 54d ago
They definitely have. Github evolved a lot faster after the microsoft acquisition, I remember being mildly impressed after it was stagnant for years (this is not an opinion on whether it was evolving in the right direction or if it was a good trade-off)
[−] carlmr 54d ago
They added the service unavailable feature.
[−] lobsterthief 53d ago
They’ve really improved Projects! If your team is entirely technical, this can replace Jira.
[−] braiamp 54d ago

> I wish we could get back to platforms (or OSes for that matter) focusing in reliability and stability

That's only a valid sentiment if you only use the big players. Both of those have medium/smaller competitors that have shown (for decades) that they are extremely boring, therefore stable.

[−] awestroke 54d ago
Perhaps when they switch over fully to Azure they'll forget to disable IPv6 access. One can dream
[−] zahlman 54d ago
That's about when I joined, and all I really remember thinking was that it was cool that I could now share my repo publicly without having to try and run a server from a residential IP.
[−] comboy 54d ago
I think stability and reliability have vastly improved over the last years in general (not necessarily talking about gh specifically)

It's just that everybody is using 100 tools and dependencies which themselves depend on 50 others to be working.

[−] jedberg 54d ago
GitHub is in a tough spot. From what I've heard they've been ordered to move everything to Azure from their long standing dataceners. That is bound to cause issues. Then on top of that they are using AI coders for infra changes (supposedly) which will also add issues.

And then on top of all that, their traffic is probably skyrocketing like mad because of everyone else using AI coders. Look at popular projects -- a few minutes after an issue is filed they have sometimes 10+ patches submitted. All generating PRs and forks and all the things.

That can't be easy on their servers.

I do not envy their reliability team (but having been through this myself, if you're reading this GitHub team, feel free to reach out!).

[−] Alifatisk 54d ago
Have anyone checked out the status page? It's actually way worse than I thought, I believe this is the first time I am actually witnessing a status page with truly horrible results.

https://mrshu.github.io/github-statuses

[−] cedws 54d ago
While GitHub obsess over shoving AI into everything, the rest of the platform is genuinely crumbling and its security flaws are being abused to cause massive damage. Last week Aqua Security was breached and a few repositories it owns were infected. The threat actors abused widespread use of mutable references in GitHub Actions, which the community has been screaming about for years, to infect potentially thousands of CI runs. They also abused an issue GitHub has acknowledged but refused to fix that allows smuggling malicious Action references into workflows that look harmless.

GHA can’t even be called Swiss cheese anymore, it’s so much worse than that. Major overhauls are needed. The best we’ve got is Immutable Releases which are opt in on a per-repository basis.

[−] mikeve 54d ago
Just to add a little bit of nuance to this not because I'm trying to defend GitHub, they definitely need to up their reliability, but the 90% uptime figure represents every single service that GitHub offers being online 90% of the time. You don't need every single service to be online in order to use GitHub. For example, I don't use Copilot myself and it's seen a 96.47% uptime, the worst of the services which are tracked.
[−] swisniewski 54d ago
To be honest, I’m not surprised that GitHub has been having issues.

If you have ever operated GitHub Enterprise Server, it’s a nightmare.

It doesn’t support active-active. It only supports passive standbys. Minor version upgrades can’t be done without downtime, and don’t support rollbacks. If you deploy an update, and it has a bug, the only thing you can do is restore from backup leading to data loss.

This is the software they sell to their highest margin customers, and it fails even basic sniff tests of availability.

Data loss for source code is a really big deal.

Downtime for source control is a really big deal.

Anyone that would release such a product with a straight face, clearly doesn’t care deeply about availability.

So, the fact that their managed product is also having constant outages isn’t surprising.

I think the problem is that they just don’t care.

[−] neonihil 54d ago
Nothing unexpected. Microsoft has a remarkable talent for turning good products into useless ones. Skype is another good showcase of such talent.
[−] pscanf 54d ago
I only use GitHub (and actions) for personal open-source projects, so I can't really complain because I'm getting everything for free¹. But even for those projects I recently had to (partially) switch actions to a paid solution² because GitHub's runners were randomly getting stuck for no discernible reason.

¹ Glossing over the "what they're getting in return" part. ² https://www.warpbuild.com/

[−] 1970-01-01 54d ago
IPv6 ignorance is the canary. There's plenty of architecture ignorance below the surface. The real question is why aren't they failing annual security audits?

https://docs.github.com/en/enterprise-cloud@latest/organizat...

[−] dijit 54d ago
I’m surprised it’s even as high as three nines, at one point in 2025 it was below 90%; not even a single nine.[0] (which, to be fair includes co-pilot, which is the worst of availabilities).

People on lobsters a month ago were congratulating Github on achieving a single nine of uptime.[1]

I make jokes about putting all our eggs in one basket under the guise of “nobody got fired for buying x; but there are sure a lot of unemployed people”- but I think there’s an insidious conversation that always used to erupt:

“Hey, take it easy on them, it’s super hard to do ops at this scale”.

Which lands hard on my ears when the normal argument in favour of centralising everything is that “you can’t hope to run things as good as they do, since there’s economies of scale”.

These two things can’t be true simultaneously.. this is the evidence.

[0]: https://mrshu.github.io/github-statuses/

[1]: https://lobste.rs/s/00edzp/missing_github_status_page#c_3cxe...

[−] yifanl 54d ago
https://mrshu.github.io/github-statuses/ Even ignoring Copilot, they seem to be barely at 2 nines of uptime for any service component.
[−] DailyGeo 54d ago
The availability expectations gap is interesting from an education standpoint. Students are tought that 99.9% sounds impressive without contextualizing what that means in practice — roughly 8 hours of downtime per year. For a platform that millions of developers depend on as critical infrastructure during work hours, that math hits very differently than it does for a consumer app.
[−] Eikon 54d ago
As of recently (workflows worked for months) I even have part of my CI on actions that fails with [0]

2026-02-27T10:11:51.1425380Z ##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled. 2026-02-27T10:11:56.2331271Z ##[error]The operation was canceled.

I had to disable the workflows.

GitHub support response has been

“ We recommend reviewing the specific job step this occurs at to identify any areas where you can lessen parallel operations and CPU/memory consumption at one time.”

That plus other various issues makes me start to think about alternatives, and it would have never occurred to me one year back.

[0] https://github.com/Barre/ZeroFS/actions/runs/22480743922/job...

[−] dathinab 54d ago
wait they still have 3 ninth, it really doesn't feel like that

but then their status center isn't really trust-able anymore and a lot of temporary issues I have been running into seem to be temporary, partial, localized failures which sometimes fall under temp. slow to a point of usability. Temporary served outdated (by >30min) main/head. etc.

so that won't even show up in this statistics

[−] dzonga 54d ago
when GitHub moved to react instead of server rendered pages ie erb/turbolinks/pjax was the start to the end.

the pages got slower, rendering became a nightmare.

then they introduced GitHub actions (half baked) - again very unreliable

then they introduced Copilot - again not very reliable

it's easy to see why availability has gone down the drain.

are they still on the rails monolith ? they speak about it less these days ?

[−] outside2344 54d ago
I have a little bit of sympathy for Github because if everyone is like me then they are getting 5-6x the demand they were last year just based on sheer commits alone, not to mention Github Copilot usage.
[−] Anon1096 54d ago
Anyone who used the phrase "measly" in relation to three nines is inadvertently admitting their lack of knowledge in massive systems. 99.9 and 99.95 is the target for some of the most common systems you use all day and is by no means easy to achieve. Even just relying on a couple regional AWS services will put your CEILING at three nines. It's even more embarrassing when people post that one GH uptime tracker that combines many services into 1 single number as if that means anything useful.
[−] jghn 54d ago
Why have five nines when you can have nine fives?
[−] bentobean 54d ago
“Microsoft Tentacle” - Now there’s a name for a new product line.
[−] Andrei_dev 54d ago
Our security scanning runs on GitHub Actions — every PR gets checked before merge. When GitHub goes down, the security gate goes down with it. PRs pile up, devs get impatient, start merging without waiting for checks. That's exactly when bad code gets through. And they keep throwing engineers at Copilot while the stuff that CI/CD actually depends on keeps falling over.
[−] ChrisArchitect 54d ago
Feb 10th post OP;

More recently:

Addressing GitHub's recent availability issues

https://github.blog/news-insights/company-news/addressing-gi...

(with a smattering of submissions here the last few weeks but no discussion)

[−] martinald 54d ago
I wonder how much of this is down to the massive amount of new repos and commits (of good or bad quality!) from the coding agents. I believe that the App Store is struggling to keep up with (mostly manual tbf) app reviews now, with sharp increases in review times.

I find it hard to believe that an Azure migration would be that detrimental to performance, especially with no doubt "unlimited credit" to play with?

You can provision Linux machines easily on Azure and... that's all you need? Or is the thinking that without bare metal NVMe mySQL it can't cope (which is a bit of a different problem tbf).

[−] pacman1337 54d ago
The irony no one is talking about: AI makes quality code worse. Was bad enough already so imagine it now. I am expecting many more services to drop from 3 nines to 1 nine.
[−] _heimdall 54d ago
I'm surprised GitHub got by acting fairly independently inside Microsoft for so long. I'm also surprised GitHub employees expected that to last

The real problem today IMO is that Microsoft waited so long to drop the charade that they now felt like they had to rip the bandaid. From what I've heard the transition hasn't gone very smoothly at all, and they've mostly been given tight deadlines with little to no help from Microsoft counterparts.

[−] ajhenrydev 54d ago
I worked on the react team while at GitHub and you could easily tell which pages rendered with react vs which were still using turbo. I wish we took perf more seriously as a culture there
[−] b00ty4breakfast 54d ago
Until paying customers start leaving en masse, they will continue to shovel out subpar service.
[−] pluc 54d ago
I'm amazed Microslop let us keep GitHub this long. Probably because they're training AI on it? To have a direct line to developers? I don't see why else they would've bothered with something that was so anti everything they stood for
[−] nubinetwork 53d ago
Has Microsoft fixed teams crashing outlook? It's been broken since Wednesday, and the last thing I heard on Friday was that they were going to wait until Monday... really goes to show how much Microsoft cares these days...
[−] amelius 54d ago
It's time to look for a decentralized Non-Hub alternative.
[−] pilif 54d ago
see also: https://thenewstack.io/github-will-prioritize-migrating-to-a...

A migration like this is a monumental undertaking to the level of where the only sensible way to do a migration like this is probably to not do it. I fully expect even worse reliability over the next few years before it'll get better.

[−] m4tthumphrey 54d ago
GitLab isn't much better right now either unfortunately.
[−] frellus 53d ago
Love the Register, but ... everyone has sort of known this for the past 10+ years if they've relied on GitHub with any sort of velocity.
[−] kylehotchkiss 54d ago
https://status.claude.com _anthropic has entered the room_
[−] ahstilde 53d ago
yeah, between github and claude they're's an outage 75% of the time: https://www.aakash.io/tech-chase/github-and-claude-are-down-...
[−] graphememes 54d ago
It's wild because they are one of the most used properties on earth and their uptime is actually incredible.
[−] azalemeth 54d ago
Embrace, extend, extinguish. Except the last one isn't quite going to plan...
[−] caconym_ 54d ago
Legitimately worse uptime than my self-hosted services. That's pretty funny.
[−] yurii_l 54d ago
Maybe they need to improve release strategy with Copilot AI Review =)
[−] sammy2255 54d ago
I wonder if they are still running on a single MySQL machine
[−] jtokoph 54d ago
Pretty soon, the only 9 they’re going to have is the 9 8s…
[−] cl0ckt0wer 54d ago
Cheap, fast, and good. I see which two they chose.