I got tired of the AI writing before finding out if they even attempted to contact Apple about this issue? Does anyone know?
Also, massively over-dramatised. Yes, a bug worth finding and knowing about, but it’s not a time bomb - very few users are likely to be affected by this.
Knowing the nature of OS kernels, I’m guessing even just putting a Mac laptop to sleep would be enough to avoid this issue as it would reset the TCP stack - which may be why some people are reporting much longer uptimes without hitting this problem, since (iirc) uptime doesn’t reset on Macs just for a sleep? Only for a full reboot?
Anyway, all in all, yeah hopefully Apple fix this but it’s not something anyone needs to panic about.
> very few users are likely to be affected by this
I have a reasonably strong suspicion that I experienced this a week or two back, on a MacBook that doesn't go into sleep automatically and quite likely had 50-ish days of uptime.
It had all the symptoms described - tcp connections not working while I could still ping everywhere just fine, and all the other devices on the same network were fine. Switching WiFi networks and plugging in to ethernet didn't help. A reboot "fixed" it.
Yep, I concur: this explains a bizarre behavior I’ve noted in my Mac laptops for ages now. I have a tendency to just suspend them without rebooting for ages, especially the work one that doesn’t leave my office as frequently. Periodically, I’d come in to find the system bizarrely frozen just as they describe: TCP stack blocked up, but everything else on it behaving normally. (Well, mostly: some apps would block starting and bounce eternally, but I suspect that’s because they’re trying to make a network call while starting up and it’s blocking.) The only fix was a reboot.
It’s not a disaster, but very annoying. At least now I can just schedule a reboot every 30 days at minimum to keep things running.
Apparently no. They'll be fixing it themselves? It really reads like Claude run amok on the blog.
> We are actively working on a fix that is better than rebooting — a targeted workaround that addresses the frozen tcp_now without requiring a full system restart. Until then, schedule your reboots before the clock runs out.
This is a bug of course, but it would cause sockets in TCP_WAIT state to be reaped anytime tcp_gc() is called, regardless of whether 2*MSL has passed or not. This only happens though if tcp_now gets stuck after 4,294,937,296 ms from boot.
A bug similar to what the blog described can happen however if tcp_now gets stuck at least 30 seconds before it it would have wrapped. Since tcp_now is only updated if there is TCP traffic, this can happen if there is no TCP traffic for at least 30 seconds before before it would roll over (MAX_INT ms from boot).
It's should be easy to prevent the latter from happening with some TCP traffic, though reaping TCP_WAIT connections early isn't great either.
But the TSTMP_GEQ macro casts the difference to int, so any number above the signed integer limit (about 2 billion) becomes negative and the comparison returns false as they said
Since the code they show and their comment says uint32_t timer wraps, but then their math doesn’t wrap it, I wonder how they missed this.
It’s also weird that their “smoking gun” example is with active TCP traffic, which (should?) be updating tcp_now and would make them more likely to fall into the “TCP_WAIT is immediately closed” case.
Does anybody else find these AI-authored blog posts difficult to read? Something about the writing style and structure just feels unnatural, it's hard put my finger on it.
At the very least, the writing takes way too long to get to a point.
AI does a good job of condensing the blog post to 2 paragraphs -- Mac refuses to let the tcp_now clock rollover when it exceeds the max value in its data type.
This but Gemini and Email - literally marketed as "write bullet points and Gemini will draft your email", followed by "received a long email? Let Gemini summarise it for you."
The world's most effective _de_compression technology for email - total waste of time and compute when combined, but each product would make sense in isolation if human-generated mail was the majority of email sent/received (except sadly it isn't). We're using AI to spam people, AI to detect spam, AI to write non-spam and AI to summarise non-spam. AI inefficiency at every level and no way back.
> It will not be caught in development testing — who runs a test for 50 days?
You don't have to run the system for 50 days. You can simulate the environment and tick the clock faster. Many high reliability systems are tested this way.
IIRC the initial value for the jiffies time counter in Linux kernel is initialized at boot time to something like five minutes before the wraparound point, precisely to catch this kind of issues.
Individual TCP connections don't need to live that long. Once a macOS system reaches 49.7 days of uptime, this bug starts affecting all TCP connections.
have multiple macOS machines with 600-1000+ day uptimes, which do TCP connections every minute or so at a minimum, they are still expiring their TIME_WAIT connections as normal.
these kernel versions:
Darwin Kernel Version 20.6.0: Thu Jul 6 22:12:47 PDT 2023; root:xnu-7195.141.49.702.12~1/RELEASE_ARM64_T8101 arm64
Darwin Kernel Version 17.7.0: Wed Apr 24 21:17:24 PDT 2019; root:xnu-4570.71.45~1/RELEASE_X86_64 x86_64
This type of problem plagues all sorts of software. Having experienced this type of problem before, for Guild Wars game servers -- which run deterministic game instances that live for long periods of time -- we initialized a per-game-context variable that gets added to Windows GetTickCount() to a value such that the result was either 5 seconds before 0x7fff_ffff ticks, or 5 seconds before 0xffff_ffff ticks, so that any weird time-computation overflow errors would be likely to show up immediately.
This made me remember some folks that are "I never reboot my MacOS and it's fine!". Yeah probably it is but I'll never trust any computer without periodic reboots lol.
Nobody keeps their Macs running for more than 49.7 days? We have Windows Servers here (with long-term TCP/IP connections) that are only rebooted every 6 months to apply patches.
I'm pretty certain I've run into this a couple of times now since upgrading to Tahoe last year and had been wondering what the deal was. Had never thought to check the uptime and make note of it, but I basically never shut down my laptop.
If you want to see exactly when your machine will hit this, I threw together a fish shell function that calculates the precise timestamp, mostly vibe coded.
I rarely restart my Mac mini, and I have never had such an issue beyond my internet provider suddenly stopping properly working in the middle of the night.
114 comments
Also, massively over-dramatised. Yes, a bug worth finding and knowing about, but it’s not a time bomb - very few users are likely to be affected by this.
Knowing the nature of OS kernels, I’m guessing even just putting a Mac laptop to sleep would be enough to avoid this issue as it would reset the TCP stack - which may be why some people are reporting much longer uptimes without hitting this problem, since (iirc) uptime doesn’t reset on Macs just for a sleep? Only for a full reboot?
Anyway, all in all, yeah hopefully Apple fix this but it’s not something anyone needs to panic about.
> very few users are likely to be affected by this
I have a reasonably strong suspicion that I experienced this a week or two back, on a MacBook that doesn't go into sleep automatically and quite likely had 50-ish days of uptime.
It had all the symptoms described - tcp connections not working while I could still ping everywhere just fine, and all the other devices on the same network were fine. Switching WiFi networks and plugging in to ethernet didn't help. A reboot "fixed" it.
It’s not a disaster, but very annoying. At least now I can just schedule a reboot every 30 days at minimum to keep things running.
My uptime resets only when forced by an OS upgrade and I have never experienced this issue. This is consistent with the sleep-heals-the-stack theory.
> We are actively working on a fix that is better than rebooting — a targeted workaround that addresses the frozen tcp_now without requiring a full system restart. Until then, schedule your reboots before the clock runs out.
Sometimes it just stops networking completely, turning the wifi adapter on/off brings it back just fine. It's also a good time to reboot =)
Remember the golden rule: if you can't be bothered to write it yourself, why should your audience be bothered to read it ourselves?
A bug similar to what the blog described can happen however if tcp_now gets stuck at least 30 seconds before it it would have wrapped. Since tcp_now is only updated if there is TCP traffic, this can happen if there is no TCP traffic for at least 30 seconds before before it would roll over (MAX_INT ms from boot).
It's should be easy to prevent the latter from happening with some TCP traffic, though reaping TCP_WAIT connections early isn't great either.
uint32_t timerwraps, but then their math doesn’t wrap it, I wonder how they missed this.It’s also weird that their “smoking gun” example is with active TCP traffic, which (should?) be updating tcp_now and would make them more likely to fall into the “TCP_WAIT is immediately closed” case.
At the very least, the writing takes way too long to get to a point.
The world's most effective _de_compression technology for email - total waste of time and compute when combined, but each product would make sense in isolation if human-generated mail was the majority of email sent/received (except sadly it isn't). We're using AI to spam people, AI to detect spam, AI to write non-spam and AI to summarise non-spam. AI inefficiency at every level and no way back.
> It will not be caught in development testing — who runs a test for 50 days?
You don't have to run the system for 50 days. You can simulate the environment and tick the clock faster. Many high reliability systems are tested this way.
>
Once a macOS system reaches 49.7 days of uptime, this bug starts affecting all TCP connections.Current
Am I supposed to be having issues with TCP connections right now? (I'm not.)uptimeon my work MacBook (macOS 15.7.4):My personal iMac is at 279 days of uptime.
They almost never do live that long, for whatever reason, but they should.
these kernel versions:
Darwin Kernel Version 20.6.0: Thu Jul 6 22:12:47 PDT 2023; root:xnu-7195.141.49.702.12~1/RELEASE_ARM64_T8101 arm64
Darwin Kernel Version 17.7.0: Wed Apr 24 21:17:24 PDT 2019; root:xnu-4570.71.45~1/RELEASE_X86_64 x86_64
so... wonder what that's about?
guess i'm marked safe!
torp@machinename ~ % uptime 11:43 up 59 days, 1:22, 4 users, load averages: 2.87 2.69 2.70
Sleep is disabled on that machine and it definitely had networking working fine last night.
Mac Mini M2, Sequoia.
Incidentally my laptop says 75 days uptime, but that one does go to sleep.
% netstat -an | grep TIME_WAIT | wc -l
850
All other systems with < 49.7 days uptime report low single to double digit numbers.
calc_tcp_overflow_time.fish: https://gist.github.com/daveorzach/64538f82a89fa24e5d134557c...
monitor_tcp_time_wait.fish: https://gist.github.com/daveorzach/0964a7a67c08c50043ff707cf...
God I wish Apple offered first party support for Linux on Mac computers.