Claude Code Found a Linux Vulnerability Hidden for 23 Years (mtlynch.io)

by eichin 268 comments 433 points
Read article View on HN

268 comments

[−] mattbee 41d ago
Pasting a big batch of new code and asking Claude "what have I forgotten? Where are the bugs?" is a very persuasive on-ramp for developers new to AI. It spots threading & distributed system bugs that would have taken hours to uncover before, and where there isn't any other easy tooling.

I bet there's loads of cryptocurrency implementations being pored over right now - actual money on the table.

[−] merlindru 41d ago
I like biasing it towards the fact that there is a bug, so it can't just say "no bugs! all good!" without looking into it very hard.

Usually I ask something like this:

"This code has a bug. Can you find it?"

Sometimes I also tell it that "the bug is non-obvious"

Which I've anecdotally found to have a higher rate of success than just asking for a spot check

[−] majormajor 41d ago
Do you not run into too many false positives around "ah, this thing you used here is known to be tricky, the issue is..."

I've seen that when prompting it to look for concurrency issues vs saying something more like "please inspect this rigorously to look for potential issues..."

[−] cmrdporcupine 41d ago
What's more useful is to have it attempt to not only find such bugs but prove them with a regression test. In Rust, for concurrency tests write e.g. Shuttle or Loom tests, etc.
[−] majormajor 41d ago
It would be generally good if most code made setting up such tests as easy as possible, but in most corporate codebases this second step is gonna require a huge amount of refactoring or boilerplate crap to get the things interacting in the test env in an accurate, well-controlled way. You can quickly end up fighting to understand "is the bug not actually there, or is the attempt to repro it not working correctly?"

(Which isn't to say don't do it: I think this is a huge benefit you can gain from being able to refactor more quickly. Just to say that you're gonna short-term give yourself a lot more homework to make sure you don't fix things that aren't bugs, or break other things in your quest to make them more provable/testable.)

[−] simulator5g 41d ago
That is an unfortunate case you described, but also, git gud and write tests in the first place so you don't need to refactor things down the road.
[−] merlindru 39d ago
yes but i can identify those easily. i know that if it flags something that is obviously a non issue, i can discard it.

...because false positives are good errors. false negatives is what i'm worried about.

i feel massively more sure that something has no big oversights if multiple runs (or even multiple different models) cannot find anything but false positives

[−] Nition 41d ago
Just in case you didn't read the full article, this is how they describe finding the bugs in the Linux kernel as well.

Since it's a large codebase, they go even more specific and hint that the bug is in file A, then try again with a hint that the bug is in file B, and so on.

[−] merlindru 39d ago
very interesting. i think "verbal biasing" and "knowing how to speak" in general is a really important thing with LLMs. it seems to massively affect output. (interestingly, somewhat less with Opus than with GPT-5.4 and Composer 2. Opus seems to intuit a little better. but still important.)

it's like the idea behind the book _The Mom Test_ suddenly got very important for programming

[−] wat10000 41d ago
You just have to be careful because it will sometimes spot bugs you could never uncover because they’re not real. You can really see the pattern matching at work with really twisted code. It tends to look at things like lock free algorithms and declare it full of bugs regardless of whether it is or not.
[−] dvfjsdhgfv 41d ago

> Pasting a big batch of new code and asking Claude "what have I forgotten? Where are the bugs?"

It's actually the main way I use CC/codex.

[−] justinclift 41d ago

> It spots threading & distributed system bugs that would have taken hours to uncover before, and where there isn't any other easy tooling.

Go has a built in race detector which may be useful for this too: https://go.dev/doc/articles/race_detector

Unsure if it's suitable for inclusion in CI, but seems like something worth looking into for people using Go.

[−] 9dev 40d ago
I usually do several passes of "review our work. Look for things to clean up, simplify, or refactor." It does usually improve the quality quite a lot; then I rewind history to before, but keep the changes, and submit the same prompt again, until it reaches the point of diminishing returns.
[−] trueno 40d ago
ive gone down this rabbit hole and i dunno, sometimes claude chases a smoking gun that just isn't a smoking gun at all. if you ask him to help find a vulnerability he's not gonna come back empty handed even if there's nothing there, he might frame a nice to have as a critical problem. in my exp you have to have build tests that prove vulnerabilities in some way. otherwise he's just gonna rabbithole while failing to look at everything.

ive had some remarkable successes with claude and quite a few "well that was a total waste of time" efforts with claude. for the most part i think trying to do uncharted/ambitious work with claude is a huge coinflip. he's great for guardrailed and well understood outcomes though, but im a little burnt out and unexcited at hearing about the gigantic-claude exercises.

[−] slig 41d ago

> "Codex wrote this, can you spot anything weird?"

[−] tosti 41d ago
[dead]
[−] aiedwardyi 41d ago
[dead]
[−] userbinator 41d ago
Not "hidden", but probably more like "no one bothered to look".

declares a 1024-byte owner ID, which is an unusually long but legal value for the owner ID.

When I'm designing protocols or writing code with variable-length elements, "what is the valid range of lengths?" is always at the front of my mind.

it uses a memory buffer that’s only 112 bytes. The denial message includes the owner ID, which can be up to 1024 bytes, bringing the total size of the message to 1056 bytes. The kernel writes 1056 bytes into a 112-byte buffer

This is something a lot of static analysers can easily find. Of course asking an LLM to "inspect all fixed-size buffers" may give you a bunch of hallucinations too, but could be a good starting point for further inspection.

[−] DGAP 41d ago
I replicated this experiment on several production codebases and got several crits. Lots of dupes, lots of false positives, lots of bugs that weren't actually exploitable, lots of accepted/ known risks. But also, crits!
[−] altern8 41d ago
Every time I read these titles, I wonder if people are for some reason pushing the narrative that Claude is way smarter than it really is, or if I'm using it wrong.

They want me to code AI-first, and the amount of hallucinations and weird bugs and inconsistencies that Claude produces is massive.

Lots of code that it pushes would NOT have passed a human/human code review 6 months ago.

[−] dist-epoch 41d ago

> "given enough eyeballs, all bugs are shallow"

Time to update that:

"given 1 million tokens context window, all bugs are shallow"

[−] PeterStuer 41d ago
Those 3 letter agencies are going to see their stash of 0-days dwindle so hard.
[−] fguerraz 41d ago
Interestingly, I think 3 or 4 out of the 5 bugs would have been prevented / mitigated quite well using https://github.com/anthraxx/linux-hardened patches...

(disabled io_uring, would have crashed the kernel on UAF, and made exploitation of the heap overflow very unreliable)

[−] summarity 41d ago
Related work from our security lab:

Stream of vulnerabilities discovered using security agents (23 so far this year): https://securitylab.github.com/ai-agents/

Taskflow harness to run (on your own terms): https://github.blog/security/how-to-scan-for-vulnerabilities...

[−] jazz9k 42d ago
This does sound great, but the cost of tokens will prevent most companies from using agents to secure their code.
[−] cesaref 41d ago
I'm interested in the implications for the open source movement, specifically about security concerns. Anyone know is there has been a study about how well Claude Code works on closed source (but decompiled) source?
[−] misiek08 41d ago
Do not expect so many more reports. Expect so many more attacks ;)
[−] e12e 41d ago
I wonder about the "video running in the background" during qna of the talk:

https://youtu.be/1sd26pWhfmg?is=XLJX9gg0Zm1BKl_5

Did he write an exploit for the NFS bug that runs via network over USB? Seems to be plugging in a SoC over USB...?

[−] eichin 42d ago
An explanation of the Claude Opus 4.6 linux kernel security findings as presented by Nicholas Carlini at unpromptedcon.
[−] simplesocieties 40d ago
Supposedly humans have become “100x”™ more productive with these AI tools, but nowhere to be seen are the benefits for the wielders of said tools. Is your salary 100x higher? Are you able to spend more time with your family/friends instead of at the office? Why are we still putting up with these outdated work practices if LLMs have made everybody so much more productive?
[−] skeeter2020 41d ago
And with AI generating vulnerabilities at an accelerated pace this business is only getting bigger. Welcome to the new antivirus!
[−] rixrax 41d ago
I hope next up is the performance and bloat that the LLMs can try and improve.

Especially on perf side I would wager LLMs can go from meat sacks what ever works to how do I solve this with best available algorithm and architecture (that also follows some best practises).

[−] alsanan2 41d ago
making public that AI is able of founding that kind of vulnerabilities is a big problem. In this case it's nice that the vulnerability has been closed before publishing but in case a cracker founds it, the result would be extremately different. This kind of news only open eyes for the crackers.
[−] jason1cho 41d ago
This isn't surprising. What is not mentioned is that Claude Code also found one thousand false positive bugs, which developers spent three months to rule out.
[−] Srinathprasanna 38d ago
[dead]
[−] jeremie_strand 41d ago
[dead]
[−] jeremie_strand 41d ago
[dead]
[−] pithtkn 41d ago
[dead]
[−] Serberus 40d ago
[dead]
[−] dfir-lab 41d ago
[flagged]
[−] LeonTing1010 41d ago
[flagged]
[−] adamsilvacons 41d ago
[flagged]
[−] redoh 41d ago
[flagged]
[−] helenazdenova 39d ago
[dead]
[−] roach54023 41d ago
[dead]
[−] noritaka88 40d ago
[flagged]
[−] claudexai 41d ago
[dead]
[−] skyskys 41d ago
[flagged]
[−] lnkl 41d ago
[flagged]
[−] yunnpp 41d ago
[flagged]
[−] _pdp_ 41d ago
The title is a little misleading.

It was Opus 4.6 (the model). You could discover this with some other coding agent harness.

The other thing that bugs me and frankly I don't have the time to try it out myself, is that they did not compare to see if the same bug would have been found with GPT 5.4 or perhaps even an open source model.

Without that, and for the reasons I posted above, while I am sure this is not the intention, the post reads like an ad for claude code.

[−] cookiengineer 41d ago

> Nicholas has found hundreds more potential bugs in the Linux kernel, but the bottleneck to fixing them is the manual step of humans sorting through all of Claude’s findings

No, the problem is sorting out thousands of false positives from claude code's reports. 5 out of 1000+ reports to be valid is statistically worse than running a fuzzer on the codebase.

Just sayin'

[−] up2isomorphism 42d ago
But on the other hand, Claude might introduce more vulnerability than it discovered.
[−] desireco42 41d ago
A developer using Claude Code found this bug. Claude is a tool. It is used by developers. It should not sign commits. Neovim never tried to sign commits with me, nor Zed.