Claude wrote a full FreeBSD remote kernel RCE with root shell (github.com)

by ishqdehlvi 105 comments 270 points
Read article View on HN

105 comments

[−] magicalhippo 45d ago
Key point is that Claude did not find the bug it exploits. It was given the CVE writeup[1] and was asked to write a program that could exploit the bug.

That said, given how things are I wouldn't be surprised if you could let Claude or similar have a go at the source code of the kernel or core services, armed with some VMs for the try-fail iteration, and get it pumping out CVEs.

If not now, then surely not in a too distant future.

[1]: https://www.freebsd.org/security/advisories/FreeBSD-SA-26:08...

[−] fragmede 45d ago

> Credits: Nicholas Carlini using Claude, Anthropic

Claude was used to find the bug in the first place though. That CVE write-up happened because of Claude, so while there are some very talented humans in the loop, Claude is quite involved with the whole process.

[−] magicalhippo 45d ago

> Claude was used to find the bug in the first place though. That CVE write-up happened because of Claude

Do you have a link to that? A rather important piece of context.

Wasn't trying to downplay this submission the way, the main point still stands:

But finding a bug and exploiting it are very different things. Exploit development requires understanding OS internals, crafting ROP chains, managing memory layouts, debugging crashes, and adapting when things go wrong. This has long been considered the frontier that only humans can cross.

Each new AI capability is usually met with “AI can do Y, but only humans can do X.” Well, for X = exploit development, that line just moved.

[−] jsnell 44d ago

> Do you have a link to that? A rather important piece of context.

It was a quote from your own link from the initial post?

https://www.freebsd.org/security/advisories/FreeBSD-SA-26:08...

> Credits: Nicholas Carlini using Claude, Anthropic

[−] magicalhippo 44d ago
Oh wow, blind as a bat.

Would have been interesting with a write-up of that, to see just what Claude was used for.

[−] jsnell 44d ago
Obviously no guarantees that it's exactly what was done in this case, but he talked about his general process recently at a conference and more in depth in a podcast:

https://www.youtube.com/watch?v=1sd26pWhfmg

https://securitycryptographywhatever.com/2026/03/25/ai-bug-f...

It pretty much is just "Claude find me an exploitable 0-day" in a loop.

[−] xorgun 44d ago
[dead]
[−] bayindirh 44d ago
Yes, that claim needs a source.
[−] lateforwork 44d ago

>

get it pumping out CVEs.

Is that a good thing or bad?

I see that as a very good thing. Because you can now inexpensively find those CVEs and fix them.

Previously, finding CVEs was very expensive. That meant only bad actors had the incentive to look for them, since they were the ones who could profit from the effort. Now that CVEs can be found much more cheaply, people without a profit motive can discover them as well--allowing vulnerabilities to be fixed before bad actors find them.

[−] cogman10 44d ago
It's good and bad.

Not all CVEs are the same, some aren't important. So it really depends on what gets founds as a CVE. The bad part is you risk a flood a CVEs that don't matter (or have already been reported).

> That meant only bad actors had the incentive to look for them

Nah. Lot's of people look for CVEs. It's good resume fodder. In fact, it's already somewhat of a problem that people will look for and report CVEs on things that don't matter just so they can get the "I found and reported CVE xyz" on their resume.

What this will do is expose some already present flaws in the CVE scoring system. Not all "9"s are created equal. Hopefully that leads to something better and not towards apathy.

[−] evanmoran 44d ago
It also depends on if the CVEs can be fixed by LLMs too. If they can find and fix them, then it's very good.
[−] cogman10 44d ago
Fixing isn't often a problem for CVEs. The hard part is almost always finding the CVE in the first place.

There are some extreme cases that might require extensive code changes, and those would benefit from LLMs. But a lot of the issues are things like off by one issues with pointers.

[−] wepple 44d ago
Fixing is now the bottleneck.

Most patches are non-trivial and then each project/maintainer has a preferred coding style, and they’re being inundated with PRs already, and don’t take kindly to slop.

LLMs can find the CVE fully zero interaction, so it scales trivially.

[−] rtkwe 44d ago
The biggest question is can you meaningfully use Claude on defense as well, eg can it be trusted to find and fix the source of the exploit while maintaining compatibility. Finding the CVEs helps directly with attacks while only helping defenders detect potential attacks without the second step where the patch can also be created. If not you've got a situation where you've got a potential tidal wave of CVEs that still have to be addressed by people. Attackers can use CVE-Claude too so it becomes a bit of an arms race where you have to find people able and willing to spend all the money to have those exploits found (and hopefully fixed).
[−] ogig 44d ago
Setting up fuzzing used to be hard. I haven't tried yet, but my bet is having Claude Code, today, analyze a codebase and suggest where and how to fuzztest it and having it review the crashes and iterate, will produce CVEs.
[−] wslh 44d ago
While it's great to clarify, LLMs are actually finding bugs and writing exploits [1][2]. There are more example though.

[1] https://news.ycombinator.com/item?id=47589227

[2] https://xbow.com/

[−] Cloudef 44d ago
You can let agent churn unattended if you have some sort of known goal. Write a test that should not pass and then tell the agent to come up with something that passes the test without changing the test itself.

For this kind of fuzzing llms are not bad.

[−] muskstinks 44d ago
You might want to watch this:

https://www.youtube.com/watch?v=1sd26pWhfmg

Claude is already able to find CVEs on expert level.

[−] cryptbe 44d ago

>Key point is that Claude did not find the bug it exploits.

It found the bug man. You didn't even read the advisory. It was credited to "Nicholas Carlini using Claude, Anthropic".

[−] Foobar8568 44d ago
Look at Xbow which spawned a few "open source" competitors.
[−] themafia 44d ago
They tried. It didn't work that well:

https://red.anthropic.com/2026/zero-days/

[−] rurban 44d ago
Nonsense. Claude did find this CVE and hundreds of similar Linux CVE's, plus it did the complete writeup and the reproducer. The Linux bugs are more worrying. His backlog is hundreds of yet unreported zero days.

He did a talk at unblocked last month.

[−] mentalgear 44d ago
[flagged]
[−] petcat 45d ago

> have a go at the source code of the kernel or core services, armed with some VMs for the try-fail iteration, and get it pumping out CVEs.

FreeBSD kernel is written in C right?

AI bots will trivially find CVEs.

[−] tptacek 44d ago
Calif (Thai Duong's firm) did a writeup on this, which should probably be the link here; it includes the prompts they used:

https://blog.calif.io/p/mad-bugs-claude-wrote-a-full-freebsd

A reminder: this bug was also found by Claude (specifically, by Nicholas Carlini at Anthropic).

[−] panstromek 45d ago
The talk "Black-Hat LLMs" just came out a few days ago:

https://www.youtube.com/watch?v=1sd26pWhfmg

Looks like LLMs are getting good at finding and exploiting these.

[−] ptx 45d ago

>

It's worth noting that FreeBSD made this easier than it would be on a modern Linux kernel: FreeBSD 14.x has no KASLR (kernel addresses are fixed and predictable) and no stack canaries for integer arrays (the overflowed buffer is int32_t[]).

What about FreeBSD 15.x then? I didn't see anything in the release notes or the mitigations(7) man page about KASLR. Is it being worked on?

NetBSD apparently has it: https://wiki.netbsd.org/security/kaslr/

[−] stephc_int13 44d ago
The most difficult part is always to find the vulnerability, not to fix it. And most people who are spending their days finding them are heavily incentivized to not disclose.

Automatic discovery can be a huge benefit, even if the transition period is scary.

[−] dnw 44d ago
[−] fragmede 45d ago
[−] decidu0us9034 44d ago
I could see that being an incremental time save (perhaps not worth the token spend except for the dev team, not a high-value bug). But nbody finds this kind of bug "by hand" and hasn't for a long time now. Do people here really care about kernel security or testing automation? They're just talking about it because Claude? Everything on HN is people doing unpaid promotional work for Anthropic, just talking about all the promise Claude holds and all the various ways you could be spending more money on Claude. bored aimless vibes.
[−] neonstatic 44d ago

> "Claude wrote"

I am hoping that quite soon we will have general acceptance of the fact that "Claude can write code" and we will switch focus to how good / not good that code is.

[−] m132 45d ago
Appreciate the full prompt history
[−] yumiatlead 44d ago
This showcases the immense power and autonomy of agents, which is the root of enterprise fear. It highlights the urgent need for governance and safety.
[−] andrewstuart 44d ago
Errrr the headline makes it sound like a bad thing.

This is what Claude is meant to be able to do.

Preventing it doing so is just security theater.

[−] a96 41d ago
Ah, RPC. The gift that keeps on giving even after 30 years of security fails.
[−] EGreg 44d ago
This requires an SSH to be available?

Is it possible to pwn without SSH listening?

[−] sheepscreek 44d ago
I find it more concerning that this is still considered newsworthy. Frontier LLMs in the hands of anyone willing to learn and determined can be a blessing or curse.
[−] jeremie_strand 44d ago
[dead]
[−] navilai 43d ago
[dead]
[−] Adam_cipher 44d ago
[flagged]
[−] imta71770 44d ago
[dead]
[−] imta71770 44d ago
[dead]
[−] aplomb1026 44d ago
[dead]
[−] bustah 44d ago
[flagged]
[−] volume_tech 44d ago
[flagged]
[−] htx80nerd 44d ago
[flagged]
[−] alcor-z 44d ago
The MADBugs work is solid, but what's sticking with me is the autonomy angle — not just finding a vuln but chaining multiple bugs into a working remote exploit without a human in the loop. FreeBSD kernel security research has always been thinner on the ground than Linux, which makes this feel both more impressive and harder to put in context. What's the actual blast radius here — is this realistically exploitable on anything with default configs, or does it need very specific conditions?
[−] jdurban 44d ago
[flagged]