A macOS bug that causes TCP networking to stop working after 49.7 days (photon.codes)

by RyanZhuuuu 114 comments 171 points
Read article View on HN

114 comments

[−] BitsAndObjects 39d ago
I got tired of the AI writing before finding out if they even attempted to contact Apple about this issue? Does anyone know?

Also, massively over-dramatised. Yes, a bug worth finding and knowing about, but it’s not a time bomb - very few users are likely to be affected by this.

Knowing the nature of OS kernels, I’m guessing even just putting a Mac laptop to sleep would be enough to avoid this issue as it would reset the TCP stack - which may be why some people are reporting much longer uptimes without hitting this problem, since (iirc) uptime doesn’t reset on Macs just for a sleep? Only for a full reboot?

Anyway, all in all, yeah hopefully Apple fix this but it’s not something anyone needs to panic about.

[−] bigiain 39d ago

> very few users are likely to be affected by this

I have a reasonably strong suspicion that I experienced this a week or two back, on a MacBook that doesn't go into sleep automatically and quite likely had 50-ish days of uptime.

It had all the symptoms described - tcp connections not working while I could still ping everywhere just fine, and all the other devices on the same network were fine. Switching WiFi networks and plugging in to ethernet didn't help. A reboot "fixed" it.

[−] castillar76 39d ago
Yep, I concur: this explains a bizarre behavior I’ve noted in my Mac laptops for ages now. I have a tendency to just suspend them without rebooting for ages, especially the work one that doesn’t leave my office as frequently. Periodically, I’d come in to find the system bizarrely frozen just as they describe: TCP stack blocked up, but everything else on it behaving normally. (Well, mostly: some apps would block starting and bounce eternally, but I suspect that’s because they’re trying to make a network call while starting up and it’s blocking.) The only fix was a reboot.

It’s not a disaster, but very annoying. At least now I can just schedule a reboot every 30 days at minimum to keep things running.

[−] fingerlocks 38d ago
GP said that suspending without rebooting prevents the issue.

My uptime resets only when forced by an OS upgrade and I have never experienced this issue. This is consistent with the sleep-heals-the-stack theory.

[−] apple4ever 37d ago
I experienced the exact same issue. Now that I know, I know what to do.
[−] BitsAndObjects 39d ago
I would not be surprised if people on HN were more likely to hit this issue than Apple's average users. We're a weird bunch ;)
[−] bigiain 39d ago
Can confirm at least on of us (me) is weird.
[−] delusional 39d ago
Apparently no. They'll be fixing it themselves? It really reads like Claude run amok on the blog.

> We are actively working on a fix that is better than rebooting — a targeted workaround that addresses the frozen tcp_now without requiring a full system restart. Until then, schedule your reboots before the clock runs out.

[−] theshrike79 38d ago
I think I might've hit my head on this a few times with my Mac Mini that's on basically 24/7 and doesn't go to sleep.

Sometimes it just stops networking completely, turning the wifi adapter on/off brings it back just fine. It's also a good time to reboot =)

[−] RyanZhuuuu 39d ago
yes we have reported to Apple and they have filed it in their internal system.
[−] otterley 39d ago
Did you need to make this blog post 20 pages long and have AI write it? Especially in such dramatic style?

Remember the golden rule: if you can't be bothered to write it yourself, why should your audience be bothered to read it ourselves?

[−] Aloisius 39d ago
Might want to update it if you used the blog post explanation because it's incorrect as justinfrankel noted below. From the post:

    tcp_now   = 4,294,960,000  (frozen at pre-overflow value)
The mistake in the blog post is timer isn't wrapped, even though it notes it should be:

    timer     = 4,294,960,000 + 30,000 = 4,294,990,000 - MAX_INT = 22,704
Therefore:

    TSTMP_GEQ(4294960000, 22704)
    = 4294960000 - 22704
    = 4294937296
    = 4294937296 >= 0 ?  → true! (not false)
This is a bug of course, but it would cause sockets in TCP_WAIT state to be reaped anytime tcp_gc() is called, regardless of whether 2*MSL has passed or not. This only happens though if tcp_now gets stuck after 4,294,937,296 ms from boot.

A bug similar to what the blog described can happen however if tcp_now gets stuck at least 30 seconds before it it would have wrapped. Since tcp_now is only updated if there is TCP traffic, this can happen if there is no TCP traffic for at least 30 seconds before before it would roll over (MAX_INT ms from boot).

It's should be easy to prevent the latter from happening with some TCP traffic, though reaping TCP_WAIT connections early isn't great either.

[−] mhjkl 30d ago
But the TSTMP_GEQ macro casts the difference to int, so any number above the signed integer limit (about 2 billion) becomes negative and the comparison returns false as they said
[−] e28eta 36d ago
Since the code they show and their comment says uint32_t timer wraps, but then their math doesn’t wrap it, I wonder how they missed this.

It’s also weird that their “smoking gun” example is with active TCP traffic, which (should?) be updating tcp_now and would make them more likely to fall into the “TCP_WAIT is immediately closed” case.

[−] tjohns 39d ago
Does anybody else find these AI-authored blog posts difficult to read? Something about the writing style and structure just feels unnatural, it's hard put my finger on it.

At the very least, the writing takes way too long to get to a point.

[−] dawnerd 39d ago
Same, AI written anything is really difficult for me to read and pretty exhausting.
[−] gowld 39d ago
AI does a good job of condensing the blog post to 2 paragraphs -- Mac refuses to let the tcp_now clock rollover when it exceeds the max value in its data type.
[−] nslsm 39d ago
Use AI to expand your thoughts into a long-winded post, use AI to compress the long-winded post into something that can be digested by a human.
[−] BitsAndObjects 39d ago
This but Gemini and Email - literally marketed as "write bullet points and Gemini will draft your email", followed by "received a long email? Let Gemini summarise it for you."

The world's most effective _de_compression technology for email - total waste of time and compute when combined, but each product would make sense in isolation if human-generated mail was the majority of email sent/received (except sadly it isn't). We're using AI to spam people, AI to detect spam, AI to write non-spam and AI to summarise non-spam. AI inefficiency at every level and no way back.

[−] bigiain 39d ago
Step 3) Sam Altman profits.
[−] coldtea 39d ago
Can it summarize it down to a non-post?
[−] brianwawok 39d ago
Can it summarize this entire hacker news post out of existence?
[−] mcculley 39d ago

> It will not be caught in development testing — who runs a test for 50 days?

You don't have to run the system for 50 days. You can simulate the environment and tick the clock faster. Many high reliability systems are tested this way.

[−] dezgeg 39d ago
IIRC the initial value for the jiffies time counter in Linux kernel is initialized at boot time to something like five minutes before the wraparound point, precisely to catch this kind of issues.
[−] bobmcnamara 39d ago
WinCE too
[−] otterley 39d ago
Sounds like it affects every open TCP connection, not just OpenClaw. (It's pretty rare for a TCP connection to live that long, though.)
[−] josephcsible 39d ago
Individual TCP connections don't need to live that long. Once a macOS system reaches 49.7 days of uptime, this bug starts affecting all TCP connections.
[−] throw0101d 39d ago

>

Once a macOS system reaches 49.7 days of uptime, this bug starts affecting all TCP connections.

Current uptime on my work MacBook (macOS 15.7.4):

    17:14  up 50 days, 22 mins, 16 users, load averages: 2.06 1.95 1.94
Am I supposed to be having issues with TCP connections right now? (I'm not.)

My personal iMac is at 279 days of uptime.

[−] CamperBob2 39d ago
Sure they do. They need to live until torn down.

They almost never do live that long, for whatever reason, but they should.

[−] gpvos 39d ago
Obviously, OpenClaw is now more important than anything else.
[−] justinfrankel 39d ago
have multiple macOS machines with 600-1000+ day uptimes, which do TCP connections every minute or so at a minimum, they are still expiring their TIME_WAIT connections as normal.

these kernel versions:

Darwin Kernel Version 20.6.0: Thu Jul 6 22:12:47 PDT 2023; root:xnu-7195.141.49.702.12~1/RELEASE_ARM64_T8101 arm64

Darwin Kernel Version 17.7.0: Wed Apr 24 21:17:24 PDT 2019; root:xnu-4570.71.45~1/RELEASE_X86_64 x86_64

so... wonder what that's about?

[−] netcoyote 39d ago
This type of problem plagues all sorts of software. Having experienced this type of problem before, for Guild Wars game servers -- which run deterministic game instances that live for long periods of time -- we initialized a per-game-context variable that gets added to Windows GetTickCount() to a value such that the result was either 5 seconds before 0x7fff_ffff ticks, or 5 seconds before 0xffff_ffff ticks, so that any weird time-computation overflow errors would be likely to show up immediately.
[−] loloquwowndueo 39d ago
lol reminds me of the windows 95 crash bug after 49.7 days. Have we learned nothing. https://pipiscrew.github.io/posts/why-window/
[−] gghootch 39d ago
What does this have to do with OpenClaw exactly?
[−] JensRantil 39d ago
This reminds me of the Linux kernel scheduler bug that kicked in after 208 days: https://www.claudiokuenzler.com/blog/247/linux-virtual-serve...
[−] beanjuiceII 39d ago
i'm on sequoia M1 laptop with uptime 16:38 up 228 days, 21:03, 1 user, load averages: 6.14 5.93 5.64

guess i'm marked safe!

[−] nottorp 38d ago
Hmm?

torp@machinename ~ % uptime 11:43 up 59 days, 1:22, 4 users, load averages: 2.87 2.69 2.70

Sleep is disabled on that machine and it definitely had networking working fine last night.

Mac Mini M2, Sequoia.

Incidentally my laptop says 75 days uptime, but that one does go to sleep.

[−] AndroTux 38d ago
Interesting. I think I can confirm this. Got a Tahoe system with 55 days uptime that's mostly idling:

% netstat -an | grep TIME_WAIT | wc -l

850

All other systems with < 49.7 days uptime report low single to double digit numbers.

[−] MatMercer 39d ago
This made me remember some folks that are "I never reboot my MacOS and it's fine!". Yeah probably it is but I'll never trust any computer without periodic reboots lol.
[−] fortran77 39d ago
Nobody keeps their Macs running for more than 49.7 days? We have Windows Servers here (with long-term TCP/IP connections) that are only rebooted every 6 months to apply patches.
[−] ingmarstein 37d ago
Thank you for this post! I think I ran into this when running UniFi OS Server (which uses podman) on macOS 26: https://community.ui.com/questions/TCP-connection-leak/2ab61...
[−] cthalupa 38d ago
I'm pretty certain I've run into this a couple of times now since upgrading to Tahoe last year and had been wondering what the deal was. Had never thought to check the uptime and make note of it, but I basically never shut down my laptop.
[−] dvh 39d ago
Exactly like arduino
[−] apple4ever 37d ago
OH this explains why randomly my iMac would REFUSE to do any connections to anything. I never put together that it was because of uptime!
[−] bawolff 39d ago
Wasn't windows 95 famous for having an issue like this?
[−] daveorzach 39d ago
If you want to see exactly when your machine will hit this, I threw together a fish shell function that calculates the precise timestamp, mostly vibe coded.

calc_tcp_overflow_time.fish: https://gist.github.com/daveorzach/64538f82a89fa24e5d134557c...

monitor_tcp_time_wait.fish: https://gist.github.com/daveorzach/0964a7a67c08c50043ff707cf...

[−] nalekberov 39d ago
I rarely restart my Mac mini, and I have never had such an issue beyond my internet provider suddenly stopping properly working in the middle of the night.
[−] NautilusWave 38d ago
How old is this bug? I can't imagine it exists on iOS or iPadOS; have those kernels really drifted that far apart though?
[−] Philpax 39d ago
Ctrl+F "OpenClaw". No results. Que?
[−] apatheticonion 39d ago
Ignoring the AI article contents.

God I wish Apple offered first party support for Linux on Mac computers.

[−] throw03172019 39d ago
I only have 11 days left until my machine crashes and I lose all of my tabs.
[−] RyanZhuuuu 34d ago
quick update: the problem has been confirmed and resolved in the latest macOS 26.4 release (from Apple)
[−] WesolyKubeczek 39d ago
In case of OpenClaw, this is a feature.
[−] cute_boi 39d ago
too much words and text for simple thing..... probably written by openclaw
[−] revv00 39d ago
Orz! A kindly reminder for rebooting.
[−] jijji 39d ago
I thought Alan Cox fixed all the TCP IP bugs in the early 1990s lol
[−] 486sx33 39d ago
[dead]
[−] awithrow 39d ago
A ticking time bomb? What an overly dramatic way to talk about a bug that requires a reboot. Its not even a hard crash.