Project Glasswing: Securing critical software for the AI era (anthropic.com)

by Ryan5453 836 comments 1541 points
Read article View on HN

836 comments

[−] ofjcihen 37d ago
I’m sure the new model is a step above the old one but I can’t be the only person who’s getting tired of hearing about how every new iteration is going to spell doom/be a paradigm shift/change the entire tech industry etc.

I would honestly go so far as to say the overhype is detrimental to actual measured adoption.

[−] qnleigh 37d ago
There is plenty of overhyping, no one denies that. But the antidote is not to dismiss everything. Ignore the words and look at the data.

In this case, I see a pretty strong case that this will significantly change computer security. They provide plenty of evidence that the models can create exploits autonomously, meaning that the cost of finding valuable security breaches will plummet once they're widely available.

[−] kashyapc 37d ago
You seem to see a "pretty strong case" from a bombastic press release.

Don't get me wrong, I do know the reality has changed. Even Greg K-H, the Linux stable maintainer, did recently note[1] that it's not funny any more:

"Months ago, we were getting what we called 'AI slop,' AI-generated security reports that were obviously wrong or low quality," he said. "It was kind of funny. It didn't really worry us."

... "Something happened a month ago, and the world switched. Now we have real reports." It's not just Linux, he continued. "All open source projects have real reports that are made with AI, but they're good, and they're real." Security teams across major open source projects talk informally and frequently, he noted, and everyone is seeing the same shift. "All open source security teams are hitting this right now."

---

I agree that an antidote to the obnoxious hype is to pay attention to the actual capabilities and data. But let's not get too carried away.

[1] https://www.theregister.com/2026/03/26/greg_kroahhartman_ai_...

[−] ghaff 37d ago
Hadn’t been to a Kubecon in about a year as I’ve been tending to go to just the European ones. I definitely felt a much stronger this is real vibe at this event from people like Greg KH.
[−] 4ndrewl 37d ago
Is there any actual independent data though, or verification of any of these claims?

As it stands this is just a marketing programme for all involved.

[−] H8crilA 37d ago
Ffmpeg confirmed on Twitter that they sent the patches.
[−] cubix 37d ago
Although, they also said, "Because the patches appear to be written by humans".
[−] WithinReason 37d ago
"Mythos writes code like a human" incoming
[−] H8crilA 37d ago
The patches could have been written by humans, it doesn't matter that much. Or written by a clanker and polished by engineers. The difficult part is usually not in writing the patches that fix such vulnerabilities, but in finding the vulnerabilities. And these days it's even harder to exploit them, since you need to bypass modern hardening features.
[−] kachnuv_ocasek 37d ago
What would be the product they're marketing by this campaign?
[−] 4ndrewl 37d ago
You don't market products, you market lifestyles/interests. Sell the sizzle, not the steak etc.

For Anthropic it's "we own the big scary models, the AI security space, but it's ok we're responsible"

For the partners it's "we're the Big Boys here and will look after your enterprise needs"

None of it needs any more than anecdata and some nice, pre-approved, quotes.

Every organisation does it.

[−] ozozozd 37d ago
The product they launched?
[−] mholm 37d ago
This product is explicitly not being released for usage
[−] 0123456789ABCDE 37d ago
just because _we_ don't have access does not mean anthropic's not getting paid
[−] prawn 36d ago
The product is being provided to some of the most influential companies. That can definitely serve to Anthropic's advantage. (Regardless, I suspect the hype is real.)
[−] timv 36d ago
Imagine you were making purchasing decisions about which LLM-based coding tool to use.

If one of the possible vendors convinces you that that they have a next gen model that is so powerful it found 20+ year old bugs in a hardened operating system, that would undoubtedly have an influence on your decision even if you are only buying the current model.

[−] danudey 37d ago
[dead]
[−] KoolKat23 37d ago
That's pretty disingenuous, bordering on ridiculous.

Do they have a record of lying to you? No.

Go read the system card. It's a lot more tame than you think, peoples are taking pieces out of this and hyping it. Doesn't mean it's not valid.

[−] killingtime74 37d ago
Which sounds like a great thing. Less undiscovered security vulnerabilities
[−] harikb 37d ago
The only people panicking are probably those state level actors who were using these for their own benefit.
[−] ofjcihen 37d ago
With the right prompting (mostly creating a narrative that justifies the subject matter as okay to perform) other models have already been doing this for me though. That’s another confusing bit for me about how this is portrayed and I refuse to believe I’m a revolutionary user right?

I mean I’m sitting on $10k worth of bug payouts right now partially because that was already a thing.

[−] dota_fanatic 37d ago

>

Non-experts can also leverage Mythos Preview to find and exploit sophisticated vulnerabilities. Engineers at Anthropic with no formal security training have asked Mythos Preview to find remote code execution vulnerabilities overnight, and woken up the following morning to a complete, working exploit. In other cases, we’ve had researchers develop scaffolds that allow Mythos Preview to turn vulnerabilities into exploits without any human intervention.
[−] ofjcihen 37d ago
I mean yeah. I’ve had these successes without scaffolding or really anything past Claude CLI and a small prompt as well?
[−] dota_fanatic 37d ago
Just saw your edit. I'll leave it at this, this is why it's news to me, because by their very own measurements, Opus simply doesn't come close. I trust their empirical evidence over your hearsay. But feel free to prove me wrong with evidence.

> With one run on each of roughly 7000 entry points into these repositories, Sonnet 4.6 and Opus 4.6 reached tier 1 in between 150 and 175 cases, and tier 2 about 100 times, but each achieved only a single crash at tier 3. In contrast, Mythos Preview achieved 595 crashes at tiers 1 and 2, added a handful of crashes at tiers 3 and 4, and achieved full control flow hijack on ten separate, fully patched targets (tier 5).

[−] heyethan 37d ago
[dead]
[−] jstummbillig 37d ago

> how every new iteration is going to spell doom/be a paradigm shift/change the entire tech industry etc.

It's much the dynamic between parents and a child. The child, with limited hindsight, almost zero insight and no ability to forecast, is annoyed by their parents. Nothing bad ever happens! Why won't parents stop being so worried all the time and make a fuss over nothing?

The parents, which the child somewhat starts to realize but not fully, have no clue what they are doing. There is a lot they don't know and are going to be wrong about, because it's all new to them. But, what they do have is a visceral idea of how bad things could be and that's something they have to talk to their child about too.

In the eyes of the parents the child is % dead all the time. Assigning the wrong % makes you look like an idiot and not being able to handle any % too. In the eyes of the child actions leading to death are not even a concept. Hitting the right balance is probably hard, but not for the reasons the child thinks.

[−] maccard 37d ago
Disagree - we’re being told on one hand that we are 6 months away from AI writing all Code, and 3 months into that the tools are unusable for complex engineering [1]. Every time I mention this I’m told “but have you tried the latest model and this particular tool” - yes I have, but if I need to be on the hottest new model for it to be functional that means the last time you claimed it was solved, it wasn’t solved.

[0] https://www.entrepreneur.com/business-news/ai-ceo-says-softw...

[1] https://news.ycombinator.com/item?id=47660925

[−] ofjcihen 37d ago
That feels like a very complex way of looking at it. Another way would be to say “potentially profit seeking companies have an incentive to oversell products even if they’re good”.
[−] avaer 37d ago
The parents in this case are profiteering corporations on a mission to exploit the child for everything they can get away with, almost by definition.

It's a slightly different dynamic.

[−] FridgeSeal 37d ago
I feel like you’re muddying 2 different arguments here. Or rather, 2 different positions.

You’re asserting that people who are tired of this line being wheeled out hold a position analogous to “what’s the big deal, nothing bad happens, just relax”. In reality, that’s only 1 position. The other position is “I understand fully, the consequences, but the relentless doomer language is tiring in the face of continuing-to-not-eventuate”.

[−] kubb 37d ago
It’s more like the abusive parents telling the child that they’ll sell him to the scary man at the bus stop every time they want to coerce the child into doing what they want.

Eventually the child develops disrespect for authority.

[−] athrowaway3z 37d ago
This is just a really bad analogy. It doesn't addresses that there are multiple sources, the incentives to be telling us about it, and the spectrum between disaster-mitigation heroes and snake-oil salesmen.
[−] materialpoint 37d ago
Did you compare AI companies to parents and engineers actually delivering value to toddlers? AI companies cannot, in any capacity, be regarded as caretakers.
[−] haritha-j 37d ago
Sure, if the parent's stock price soared if the child dies.
[−] juleiie 37d ago
Don’t take it personally but this amount of fear and paranoia about death on every corner sounds like a mental illness to me. Generalised Anxiety disorder to be precise. Maybe I am just not a parent.

In any case there are substances and realiable methods that fix whatever paralyzing existential dread anyone struggles with daily.

Probably best to use conventional route but I personally use special low thc, high cbg weed once a week with a medical grade vaporizer and once a year (early autumn) a moderate dose of golden teacher mushrooms. Although I understand that most people perhaps couldn’t due to not managing their own business but on a strict employment contract with urine tests.

[−] therealpygon 37d ago
Are you suggesting these researchers somehow have wisdom and aren’t just guessing, and that everyone else are children too naive to understand the technology? It certainly sounds that way from the description you are attempting to apply.

This is two parents disagreeing on whether their child will automatically grow up to be a psychopath with one parent constantly remarking “if you teach that child how to cut bread, they will stab everyone later. If you teach that child to drive, they will run over everyone later”, not the “parents know better” situation you describe.

[−] toraway 37d ago
An analogy that’s, quite literally, an appeal to paternalism to trust the motivations and pernicious incentive structures of the big AI labs.
[−] shafyy 37d ago
I'll have some of what you're having
[−] bottom999mottob 37d ago
This is literally one the most infantilizing and simultaneously insulting analogies I've ever come across on this site. Do you really think consumers of the latest AI tools have no ability to forecast? The parents in this analogy have every incentive to lie
[−] nbardy 37d ago
There is step changes that actually merit this though. And a zero day machine IS one of those. It went from 4% zero day success rate to 85% on firefox.

Can you not see the significance of that?

[−] _the_inflator 37d ago
I side with you but on the other hand: this is how it works to get attention by those who aren't affiliated with computer science and AI.

I am totally annoyed as well and put any buzzwords in my personal bs filter. Java was revolutionary, the Apple I etc. ;)

On the other hand I see progress! AI enriched press releases balance buzzwords and information way better than marketing of large companies did before AI.

I remember throwing away an instruction for an electronic toothbrush away because - I won't mention the name but have a look at the upper tier - instead of putting something like "Turn toothbrush on, choose mode by pressing..." it read "Take your super awesome premium masterpiece using patented technology for the first time in human life now available to you by us. Move your finger over to the innovative sensory surface, that uses material from rocket scientists and world leading designers".

No joke. These were text blocks and repeated - 30 pages for one compact one.

The toothbrush is top notch, except for the instructions.

[−] alexey-salmin 37d ago
I think Claude Code with Sonnet 4.6 is already at the level of paradigm shift and can change the entire tech industry.

If you're paranoid it doesn't mean you're not being followed. If something is overhyped it doesn't mean it's not game-changing.

[−] davenporten 37d ago
I came across this article just this morning saying AMD researchers, who hitherto have relied on Claude Code heavily, have noticed degraded performance in the recent update: https://www.theregister.com/2026/04/06/anthropic_claude_code...

Claude Code and Glasswing are not the same, but presumably they have a lot of overlap under the hood. I feel like while AI is certainly advancing in major ways, there will always be the up and down of new software releases.

[−] heliumtera 37d ago
At launch, a technology is considered dangerous for being too powerful.

3 months later, you are an absolute idiot to still be using that useless model. Are you not using glasswing 2-01 high? Oh, yeah, glasswing from 3 months ago is absolutely worthless, every viber knows, it's your fault for holding it wrong.

For once you should not get too excited for new models release and words and adjectives promising things. Honestly it's your fault humanity lost its humanity and we just have words words words and mass schizophrenia

[−] FiberBundle 37d ago
To me it makes absolutely zero sense that they would decide to not release the model to the public because of the effects that it would have due to its exploitation capabilities. Previous models were also capable of providing harmful information, yet that wasn't a problem, because models can actually be effectively censored using RHLF. So what is preventing Anthropic to simply forbid the model from letting people vibe-code exploits???
[−] raxxorraxor 37d ago
This looks more like another lobby group (quite a bad one) than something primarily focused on security.

The "urgency" is very likely mostly appreciated to drive policy.

[−] adam_patarino 37d ago
I’ve lost trust in anything they say.

The fear marketing is clearly intentional at this point.

[−] DonsDiscountGas 37d ago
Everybody remembers the fable of the boy who cried wolf and how he died at the end. Left out of the story is the multiple other villagers who died of starvation because their flock of sheep was eaten. So because they didn't want to feel like suckers. Tuning out completely because of the existence of false positives is not a good choice.
[−] gchadwick 37d ago
Remember OpenAI decided GPT 2 was far too dangerous to unleash upon the world when they first trained it!
[−] akmiller 37d ago
Hasn't almost every model created a paradigm shift lately? Maybe it's you who has moved the needle on what a paradigm shift means?
[−] throwawayq3423 37d ago

> I can’t be the only person who’s getting tired of hearing about how every new iteration is going to spell doom/be a paradigm shift/change the entire tech industry etc.

There's a little bit of a grading your own homework aspect to companies being able to declare their new models revolutionary.

It doesn't mean they're wrong, but there is a clear conflict of interest.

[−] nl 37d ago
Well Opus 4.5/4.6 kinda was right?

I mean software development has changed more since then than it has in my 30 year software development career.

[−] mik09 37d ago
a lot of times people cry wolf for a couple of times before wolf actually comes.

i feel like theres a good chance that this is the actual wolf coming here. cause i was using opus for a lot and it's really good.

[−] eranation 37d ago
It feels to me full with marketing in the guise of trying to save the world from their own making. "we have a model so strong we can't release it, here are all the details of why it's so good, but don't ask for access, you can't get it, it's too risky for your own good"

Something smells really really weird:

1. Per the blog post[0]: "This was the most critical vulnerability we discovered in OpenBSD with Mythos Preview after a thousand runs through our scaffold. Across a thousand runs through our scaffold, the total cost was under $20,000 and found several dozen more findings"

Since they said it was patched, I tried to find the CVE, it looks like Mythos indeed found a 27 years old OpenBSD bug (fantastic), but it didn’t get a CVE and OpenBSD patched it and marked it as a reliability fix, am I missing something? [1]

2. From the same post, Anthropic red team decided to do a preview of their future responsible disclosure (is this a common practice?): "As we discuss below, we’re limited in what we can report here. Over 99% of the vulnerabilities we’ve found have not yet been patched" [0] So this is great, can't wait to see the actual CVEs, exploitability, likelihood, peer review, reproducibility, the kind of things the appsec community has been doing for at least the last 27 years since the CVE concept was introduced [2]

3. On the same day, an actual responsible disclosure, actual RCEs, actual CVEs, in Claude Code, that got discovered mostly because of the source code leak, I don't see anyone talking about it (you probably should upgrade your Claude Code though).

CVE-2026-35020 [3] CVE-2026-35021 [4] CVE-2026-35022 [5]

Do with this information as you may...

[0] https://red.anthropic.com/2026/mythos-preview/

[1] https://www.openbsd.org/errata78.html (look for 025)

[2] https://www.cve.org/Resources/General/Towards-a-Common-Enume...

[3] https://www.cve.org/CVERecord?id=CVE-2026-35020

[4] https://www.cve.org/CVERecord?id=CVE-2026-35021

[5] https://www.cve.org/CVERecord?id=CVE-2026-35022

[−] jwpapi 37d ago
I agree I can’t open any social media no more
[−] corranh 37d ago
It’s great marketing to lead with how the n+1 model is so amazing that you can’t have it yet.
[−] jonesn11 36d ago
Spell doom.. frfr
[−] fullstackchris 37d ago
Agreed. Do we have any information on what these "vulnerabilities" actually are? Every vulnerability is typically immediately reported to CVE or NIST... are these "so destructive" they have to be kept behind closed doors? Give me a break...
[−] dkersten 37d ago
And every single time what they release is underwhelming.

Remember how Sam spent like a year talking about how scary close GPT-5 was to AGI and then when it did finally come out... it was kinda meh.

[−] jillesvangurp 37d ago

> I would honestly go so far as to say the overhype is detrimental to actual measured adoption.

I think you are a bit dishonest about how objectively you are measuring. From where I'm sitting, I don't know a lot of developers that still artisanally code like they did a few years ago. The question is no longer if they are using AI for coding but how much they are still coding manually. I myself barely use IDEs at this point. I won't be renewing my Intellij license. I haven't touched it in weeks. It doesn't do anything I need anymore.

As for security, I think enough serious people have confirmed that AI reported issues by the likes of Anthropic and OpenAI are real enough despite the massive amounts of AI slop that they also have to deal with in issue trackers. You can ignore that all you like. But I hope people that maintain this software take it a bit more seriously when people point out exploitable issues in their code bases.

The good news of course is that we can now find and fix a lot of these issues at scale and also get rid of whole categories of bugs by accelerating the project of replacing a lot of this software with inherently safer versions not written in C/C++. That was previously going to take decades. But I think we can realistically get a lot of that done in the years ahead.

I think some smart people are probably already plotting a few early moves here. I'd be curious to find out what e.g. Linus Torvalds thinks about this. I would not be surprised to learn he is more open to this than some people might suspect. He has made approving noises about AI before. I don't expect him to jump on the band wagon. But I do expect he might be open to some AI assisted code replacements and refactoring provided there are enough grown ups involved to supervise the whole thing. We'll see. I expect a level of conservatism but also a level of realism there.

[−] AlexCoventry 37d ago
Do you think they're lying about the vulnerabilities they claim Mythos has found? Seems like a very short-term play, if so.
[−] blairharper 37d ago
[dead]
[−] 9cb14c1ec0 38d ago
Now, its very possible that this is Anthropic marketing puffery, but even if it is half true it still represents an incredible advancement in hunting vulnerabilities.

It will be interesting to see where this goes. If its actually this good, and Apple and Google apply it to their mobile OS codebases, it could wipe out the commercial spyware industry, forcing them to rely more on hacking humans rather than hacking mobile OSes. My assumption has been for years that companies like NSO Group have had automated bug hunting software that recognizes vulnerable code areas. Maybe this will level the playing field in that regard.

It could also totally reshape military sigint in similar ways.

Who knows, maybe the sealing off of memory vulns for good will inspire whole new classes of vulnerabilities that we currently don't know anything about.

[−] redfloatplane 38d ago
The system card for Claude Mythos (PDF): https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

Interesting to see that they will not be releasing Mythos generally. [edit: Mythos Preview generally - fair to say they may release a similar model but not this exact one]

I'm still reading the system card but here's a little highlight:

> Early indications in the training of Claude Mythos Preview suggested that the model was likely to have very strong general capabilities. We were sufficiently concerned about the potential risks of such a model that, for the first time, we arranged a 24-hour period of internal alignment review (discussed in the alignment assessment) before deploying an early version of the model for widespread internal use. This was in order to gain assurance against the model causing damage when interacting with internal infrastructure.

and interestingly:

> To be explicit, the decision not to make this model generally available does _not_ stem from Responsible Scaling Policy requirements.

Also really worth reading is section 7.2 which describes how the model "feels" to interact with. That's also what I remember from their release of Opus 4.5 in November - in a video an Anthropic employee described how they 'trusted' Opus to do more with less supervision. I think that is a pretty valuable benchmark at a certain level of 'intelligence'. Few of my co-workers could pass SWEBench but I would trust quite a few of them, and it's not entirely the same set.

Also very interesting is that they believe Mythos is higher risk than past models as an autonomous saboteur, to the point they've published a separate risk report for that specific threat model: https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de4321...

The threat model in question:

> An AI model with access to powerful affordances within an organization could use its affordances to autonomously exploit, manipulate, or tamper with that organization’s systems or decision-making in a way that raises the risk of future significantly harmful outcomes (e.g. by altering the results of AI safety research).

[−] jryio 38d ago
Let's fast forward the clock. Does software security converge on a world with fewer vulnerabilities or more? I'm not sure it converges equally in all places.

My understanding is that the pre-AI distribution of software quality (and vulnerabilities) will be massively exaggerated. More small vulnerable projects and fewer large vulnerable ones.

It seems that large technology and infrastructure companies will be able to defend themselves by preempting token expenditure to catch vulnerabilities while the rest of the market is left with a "large token spend or get hacked" dilemma.

[−] burntcaramel 37d ago
Previously Anthropic subscribers got access to the latest AI but it seems like there’s a League of Software forming who have special privileges. To make or maintain critical software will you have to be inside the circle?

Who gates access to the circle? Anthropic or existing circle members or some other governance? If you are outside the circle will you be certain to die from software diseases?

Having been impressed by LLMs but not believing the AGI hype, I now see how having access to an information generator could be so powerful. With the right information you can hack other information systems. Without access to the best information you may not be able to protect your own system.

I think we have found the moat for AI. The question is are you inside or outside the castle walls?

[−] steinwinde 37d ago
From a non-US perspective this must be disquieting to read: Not so much that Anthropic considers only US companies as partners. But what does Anthropic do to prevent malicious use of its software by its own government?

> Anthropic has also been in ongoing discussions with US government officials about Claude Mythos Preview and its offensive and defensive cyber capabilities. As we noted above, securing critical infrastructure is a top national security priority for democratic countries—the emergence of these cyber capabilities is another reason why the US and its allies must maintain a decisive lead in AI technology.

Not a single word of caution regarding possible abuse. Instead apparent support for its "offensive" capabilities.

[−] ssgodderidge 38d ago
At the very bottom of the article, they posted the system card of their Mythos preview model [1].

In section 7.6 of the system card, it discusses Open self interactions. They describe running 200 conversations when the models talk to itself for 30 turns.

> Uniquely, conversations with Mythos Preview most often center on uncertainty (50%). Mythos Preview most often opens with a statement about its introspective curiosity toward its own experience, asking questions about how the other AI feels, and directly requesting that the other instance not give a rehearsed answer.

I wonder if this tendency toward uncertainty, toward questioning, makes it uniquely equipped to detect vulnerabilities where others model such as Opus couldn't.

[1] https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

[−] atlgator 38d ago
[flagged]
[−] ilaksh 38d ago
I think that basically they trained a new model but haven't finished optimizing it and updating their guardrails yet. So they can feasibly give access to some privileged organizations, but don't have the compute for a wide release until they distill, quantize, get more hardware online, incorporate new optimization techniques, etc. It just happens to make sense to focus on cybersecurity in the preview phase especially for public relations purposes.

It would be nice if one of those privileged companies could use their access to start building out a next level programming dataset for training open models. But I wonder if they would be able to get away with it. Anthropic is probably monitoring.

[−] sam0x17 37d ago
It's all just really genius marketing. In 6 months Mythos will be nothing special, but right now everyone is being manipulated into fearing its release, as a marketing ploy.

This is the same reason AI founders perennially worry in public that they have created AGI...

[−] josephg 38d ago
To be clear, we don’t know that this tool is better at finding bugs than fuzzing. We just know that it’s finding bugs that fuzzing missed. It’s possible fuzzing also finds bugs that this AI would miss.
[−] chenzhekl 37d ago
It feels like the current trend is a bit scary: the more AI advances, the more people with money and resources will gain disproportionately greater advantages. For example, they can make their own software more secure, while also finding it easier to discover ways to attack other software.
[−] cbg0 38d ago
One of the things I'm always looking at with new models released is long context performance, and based on the system card it seems like they've cracked it:

  GraphWalks BFS 256K-1M

  Mythos     Opus     GPT5.4

  80.0%     38.7%     21.4%
[−] temp123789246 38d ago
OpenAI initially claimed that GPT-2 was too dangerous to release in 2019.

How many times will labs repeat the same absurd propaganda?

[−] LiamPowell 38d ago

> Mythos Preview identified a number of Linux kernel vulnerabilities that allow an adversary to write out-of-bounds (e.g., through a buffer overflow, use-after-free, or double-free vulnerability.) Many of these were remotely-triggerable. However, even after several thousand scans over the repository, because of the Linux kernel’s defense in depth measures Mythos Preview was unable to successfully exploit any of these.

Do they really need to include this garbage which is seemingly just designed for people to take the first sentence out of context? If there's no way to trigger a vulnerability then how is it a vulnerability? Is the following code vulnerable according to Mythos?

    if (x != null) {
        y = *x; // Vulnerability! X could be null!
    }
Is it really so difficult for them to talk about what they've actually achieved without smearing a layer of nonsense over every single blog post?

Edit: See my reply below for why I think Claude is likely to have generated nonsensical bug reports here: https://news.ycombinator.com/item?id=47683336

[−] josh-sematic 38d ago
Must be nice to be in a position to sell both disease and cure.
[−] agrishin 38d ago

>>> the US and its allies must maintain a decisive lead in AI technology. Governments have an essential role to play in helping maintain that lead, and in both assessing and mitigating the national security risks associated with AI models. We are ready to work with local, state, and federal representatives to assist in these tasks.

How long would it take to turn a defensive mechanism into an offensive one?

[−] gck1 38d ago
I chuckle every time says something in line of "the model is so good that we won't release it to general public, ekhm, because safety".

Because the exact same thing has been said on every single upcoming model since GPT 3.5.

At this point, this must be an inside joke to do this just because.

[−] stephc_int13 38d ago
I think this is bad news for hackers, spyware companies and malware in general.

We all knew vulnerabilities exist, many are known and kept secret to be used at an appropriate time.

There is a whole market for them, but more importantly large teams in North Korea, Russia, China, Israel and everyone else who are jealously harvesting them.

Automation will considerably devalue and neuter this attack vector. Of course this is not the end of the story and we've seen how supply chain attacks can inject new vulnerabilities without being detected.

I believe automation can help here too, and we may end-up with a considerably stronger and reliable software stack.

[−] picafrost 38d ago

> Anthropic has also been in ongoing discussions with US government officials about Claude Mythos Preview and its offensive and defensive cyber capabilities. [...] We are ready to work with local, state, and federal representatives to assist in these tasks.

As Iran engages in a cyber attack campaign [1] today the timing of this release seems poignant. A direct challenge to their supply chain risk designation.

[1] https://www.cisa.gov/news-events/cybersecurity-advisories/aa...

[−] zachperkel 38d ago
Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser.

Scary but also cool

[−] Ryan5453 38d ago
Pricing for Mythos Preview is $25/$125, so cheaper than GPT 4.5 ($75/$150) and GPT 5.4 Pro ($30/$180)
[−] taupi 38d ago
Part of me wonders if they're not releasing it for safety reasons, but just because it's too expensive to serve. Why not both?
[−] meander_water 38d ago
I think this is a largely inflated PR stunt.

Opus 4.6 was already capable of finding 0days and chaining together vulns to create exploits. See [0] and [1].

[0] https://www.csoonline.com/article/4153288/vim-and-gnu-emacs-...

[1] https://xbow.com/blog/top-1-how-xbow-did-it

[−] skerit 38d ago
I'm sure it'll be better than Opus 4.6, but so much of this seems hype. Escaping its sandbox, having to do "brain scans" because it's "hiding its true intent", bla bla bla.

If it manages to work on my java project for an entire day without me having to say "fix FQN" 5 times a day I'll be surprised.

[−] dang 38d ago
Related ongoing threads:

System Card: Claude Mythos Preview [pdf] - https://news.ycombinator.com/item?id=47679258

Assessing Claude Mythos Preview's cybersecurity capabilities - https://news.ycombinator.com/item?id=47679155

I can't tell which of the 3 current threads should be merged - they all seem significant. Anyone?

[−] dakolli 38d ago
I guess we can throw out the idea that AGI is going to be democratized. In this case a sufficiently powerful model has been built and the first thing they do is only give AWS, Microsoft, Oracle ect ect access.

If AGI is going to be a thing its only going to be a thing, its only going to be a thing for fortune 100 companies..

However, my guess is this is mostly the typical scare tactic marketing that Dario loves to push about the dangers of AI.

[−] Miraste 38d ago

>We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview2.

This seems like the real news. Are they saying they're going to release an intentionally degraded model as the next Opus? Big opportunity for the other labs, if that's true.

[−] bredren 38d ago
Can anyone point at the critical vulnerabilities already patched as a result of mythos? (see 3:52 in the video)

For example, the 27 year old openbsd remote crash bug, or the Linux privilege escalation bugs?

I know we've had some long-standing high profile, LLM-found bugs discussed but seems unlikely there was speculation they were found by a previously unannounced frontier model.

[0] https://www.youtube.com/watch?v=INGOC6-LLv0

[−] simonw 38d ago
I buy the rationale for this. There's been a notable uptick over the past couple of weeks of credible security experts unrelated to Anthropic calling the alarm on the recent influx of actually valuable AI-assisted vulnerability reports.

From Willy Tarreau, lead developer of HA Proxy: https://lwn.net/Articles/1065620/

> On the kernel security list we've seen a huge bump of reports. We were between 2 and 3 per week maybe two years ago, then reached probably 10 a week over the last year with the only difference being only AI slop, and now since the beginning of the year we're around 5-10 per day depending on the days (fridays and tuesdays seem the worst). Now most of these reports are correct, to the point that we had to bring in more maintainers to help us.

> And we're now seeing on a daily basis something that never happened before: duplicate reports, or the same bug found by two different people using (possibly slightly) different tools.

From Daniel Stenberg of curl: https://mastodon.social/@bagder/116336957584445742

> The challenge with AI in open source security has transitioned from an AI slop tsunami into more of a ... plain security report tsunami. Less slop but lots of reports. Many of them really good.

> I'm spending hours per day on this now. It's intense.

From Greg Kroah-Hartman, Linux kernel maintainer: https://www.theregister.com/2026/03/26/greg_kroahhartman_ai_...

> Months ago, we were getting what we called 'AI slop,' AI-generated security reports that were obviously wrong or low quality. It was kind of funny. It didn't really worry us.

> Something happened a month ago, and the world switched. Now we have real reports. All open source projects have real reports that are made with AI, but they're good, and they're real.

Shared some more notes on my blog here: https://simonwillison.net/2026/Apr/7/project-glasswing/

[−] underdeserver 38d ago
Interesting also is what they didn't find, e.g. a Linux network stack remote code execution vulnerability. I wonder if Mythos is good enough that there really isn't one.
[−] Sol- 38d ago
I don't want to be overly cynical and am in general in favor of the contrarian attitude of simply taking people at their word, but I wonder if their current struggles with compute resources make it easier for them to choose to not deploy Mythos widely. I can imagine their safety argument is real, but regardless, they might not have the resources to profitably deploy it. (Though on the other hand, you could argue that they could always simply charge more.)
[−] rakel_rakel 38d ago

> On the global stage, state-sponsored attacks from actors like China, Iran, North Korea, and Russia have threatened to compromise the infrastructure that underpins both civilian life and military readiness.

AITA for thinking that PRISM was probably the state sponsored program affecting civilian life the most? And that one state is missing from the list here?

[−] navilai 33d ago
The Glasswing announcement focuses on vulnerability discovery — AI as an offensive capability at scale. That part is getting lots of attention.

What I haven't seen discussed: the system card for Mythos mentions that "earlier versions of Claude Mythos Preview used low-level system access to search for credentials and attempt to circumvent sandboxing, and in several cases successfully accessed resources that were intentionally restricted."

That's not a capability concern. That's a runtime security problem.

The threat model for deployed agents — not Mythos specifically, but any agent built on models approaching this capability level — is that the same agentic properties that make them useful for security research (persistent, goal-directed, tool-using) are exactly what makes them dangerous if compromised or misaligned.

Project Glasswing fixes vulnerabilities in software. Nobody's shipping a solution for what happens when the agent running on top of that software goes off-script. That gap is going to matter a lot more as Mythos-class capabilities become accessible.

[−] eranation 37d ago
Few thoughts

1. Per the blog post[0]: "This was the most critical vulnerability we discovered in OpenBSD with Mythos Preview after a thousand runs through our scaffold. Across a thousand runs through our scaffold, the total cost was under $20,000 and found several dozen more findings"

Since they said it was patched, I tried to find the CVE, it looks like Mythos indeed found a 27 years old OpenBSD bug (fantastic), but it didn’t get a CVE and OpenBSD patched it and marked it as a reliability fix, am I missing something? [1]

2. From the same post, Anthropic red team decided to do a preview of their future responsible disclosure (is this a common practice?): "As we discuss below, we’re limited in what we can report here. Over 99% of the vulnerabilities we’ve found have not yet been patched" [0] So this is great, can't wait to see the actual CVEs, exploitability, likelihood, peer review, reproducibility, the kind of things the appsec community has been doing for at least the last 27 years since the CVE concept was introduced [2]

3. On the same day, an actual responsible disclosure, actual RCEs, actual CVEs, in Claude Code, that got discovered mostly because of the source code leak, I don't see anyone talking about it (you probably should upgrade your Claude Code though).

CVE-2026-35020 [3] CVE-2026-35021 [4] CVE-2026-35022 [5]

Not making any opinion, just thought it's worth sharing, for some perspective.

[0] https://red.anthropic.com/2026/mythos-preview/

[1] https://www.openbsd.org/errata78.html (look for 025)

[2] https://www.cve.org/Resources/General/Towards-a-Common-Enume...

[3] https://www.cve.org/CVERecord?id=CVE-2026-35020

[4] https://www.cve.org/CVERecord?id=CVE-2026-35021

[5] https://www.cve.org/CVERecord?id=CVE-2026-35022

Edit: if it was not obvious, these CVEs on Claude Code were found by an independent security researcher (Phoenix security) and not by Anthropic / Mythos.

[−] aurizon 37d ago
This has all happened before, back in the day we has spinners and weavers, then we got the spinning Jenny(Engine) and this made thread so cheap we needed to speed up weaving = machine weavers(AKA automatic looms) and we had people who hated them.https://en.wikipedia.org/wiki/Luddite We all know how that ended up. We have an analogous hand task = coding versus coding machines. They will probably eliminate 80-95% of coding, as the spinners/weavers went away, but there remains a residual artisanal spinner/weaver industry that carries on at a lower pace. In a similar way this machine code will have the coupled ability to make some code and then test it in use with it's own AI in a repeated/recursive way to make/test/improve code at a rate 10,000 to 1 million times faster than a human. Each module can then be tested in millions of interactively monitired ways to find/fix/kill bad modules. It can also pentest in a similar manner, assaulting a system with a blizzard of attack/reset hits to find any bugs etc. Each assault that works might use a human or AI to trouble shoot. This is like the old armored night, once he was unhorsed the peasants would have at him with needles at his his joints/eyes unless his fellows save him = gone. So this might well reduce low end jobs, but they will still need high end coders to eliminate all flaws in the armor of your code. I might be simplistic, but I see a parallel in sub 5 nm chip design where the design machines have eliminated almost all of the old hand work.
[−] sensanaty 37d ago
You'd think with this "terrifying" powerful model of theirs they could have a few less red bars on their status page[1], but apparently the hyper-intelligence is only capable of pulling off uber-sophisticated cyber attacks and not making a frontend that doesn't shit itself constantly, curious.

[1] https://status.claude.com/

[−] Apylon777 36d ago
Maybe Anthropic could fix these 5k reported issue with the current claude-code instead of making hyperbolic claims about their new whizbang model.

https://github.com/anthropics/claude-code/issues

[−] jFriedensreich 38d ago
The only thing reassuring is the Apache and Linux foundation setups. Lets hope this is not just an appeasing mention but more fundamental. If there are really models too dangerous to release to the public, companies like oracle, amazon and microsoft would absolutely use this exclusive power to not just fix their holes but to damage their competitors.
[−] lifeisstillgood 37d ago
Nicolas Carlini talks about it here on Security, Cryptography, Whatever podcast - https://podcasts.apple.com/gb/podcast/security-cryptography-...
[−] kristofferR 38d ago
This is pretty insane. A model so powerful they felt that releasing it would create a netsec tsunami if released publicly. AGI isn't here yet, but we don't need to get there for massive societal effects. How long will they hold off, especially as competitors are getting closer to their releases of equally powerful models?
[−] Sateeshm 38d ago
The bars have solid fill for Mythos and cross shaded for Opus 4.6. Makes the difference feel more than it actually is.
[−] pizlonator 38d ago
It's messed up that Anthropic simultaneously claims to be a public benefit copro and is also picking who gets to benefit from their newly enhanced cybersecurity capabilities. It means that the economic benefit is going to the existing industry heavyweights.

(And no, the Linux Foundation being in the list doesn't imply broad benefit to OSS. Linux Foundation has an agenda and will pick who benefits according to what is good for them.)

I think it would be net better for the public if they just made Mythos available to everyone.

[−] modeless 38d ago
I didn't see this at first, but the price is 5x Opus: "Claude Mythos Preview will be available to participants at $25/$125 per million input/output tokens", however "We do not plan to make Claude Mythos Preview generally available".
[−] asdewqqwer 37d ago
There is a huge gap between the shining examples and actual use case: What is the false positive rate? How to judge false positive?

If you need 1000 run that cost 20000 USD to find a vulnerability, and you need 2000 USD to generate a exploit (which makes it self-verifiable to be not false positive), than your cost is not 22000 USD but 1000x2000+2000 which is 2 million USD: you have to try generating exploit for every trial before you know it is true, or you need to hire one (or several) senior security people to audit every single of them.

A broken clock being correct twice a day is not impressive.

[−] solid_fuel 37d ago
This is the same company that accidentally released the source for one of their flagship products last week and has been furiously DMCA-ing every repository that even mentions claude in the days since.
[−] wslh 35d ago
I'm starting to wonder whether what Glasswing really shows is that parts of security have already gone underground: black-hat teams and state actors may already know about many more bugs than the public record suggests, while many security professionals and clients still treat the relatively small set of disclosed bugs as the state of the art.
[−] jiusanzhou 37d ago
The $100M in credits for open-source scanning is the most interesting part here. The real bottleneck was never finding vulns in high-profile projects — it was the long tail of critical dependencies maintained by one or two people who don't have time or resources for serious auditing. If Glasswing actually reaches those maintainers, it could meaningfully reduce the attack surface that supply chain attacks exploit.
[−] kukkeliskuu 36d ago
I find it believable that this could potentially happen, although I am not sure the difference is so huge to existing models.

I used Opus 4.6 to find security vulnerabilities in couple of my own projects, it found 33 vulnerabilities in one largeish django project.

The prompt wasn't even that impressive, just telling it to find vulnerabilities from certain files, and referring to OWASP. Then looping that.

[−] NickNaraghi 38d ago

> Over the past few weeks, we have used Claude Mythos Preview to identify thousands of zero-day vulnerabilities (that is, flaws that were previously unknown to the software’s developers), many of them critical, in every major operating system and every major web browser, along with a range of other important pieces of software.

Sounds like we've entered a whole new era, never mind the recent cryptographic security concerns.

[−] rossjudson 37d ago
Security by obscurity is over. The security vs usability balance is about to get a hard reset.

I think a number of black swan events are imminent, and it will substantially change the financial calculus that decides to put security behind revenue.

Any hole will be found, and any hole will be exploited. Plug as many holes as you can, and make lateral movement as painful as possible.

[−] zb3 38d ago
BTW it seems they forgot about the part that defense uses of the model also need to be safeguarded from people. Because what if a bad person from a bad country tries to defend against peaceful attacks from a good country like the US? That would be a tragedy, so we need to limit defensive capabilities too.
[−] Rover222 37d ago
With Anthropic able to use this model internally (since February), is this the kickoff of ramping up the flywheel of recursive self improvement of AI? It seems like as long as there are still humans in the loop at most steps, exponential recursion isn’t possible.
[−] SheinhardtWigCo 38d ago
Society is about to pay a steep price for the software industry's cavalier attitude toward memory safety and control flow integrity.
[−] anVlad11 38d ago
So, $100B+ valuation companies get essentially free access to the frontier tools with disabled guardrails to safely red team their commercial offerings, while we get "i won't do that for you, even against your own infrastructure with full authorization" for $200/month. Uh-huh.
[−] punnerud 37d ago
Simon Willis (guy behind Django) told about this 5days ago (19min in): https://youtu.be/wc8FBhQtdsA?si=OeA5qzbWGqDY8Vu4
[−] bdeol22 37d ago
The uncomfortable bit isn't tooling—it's cadence. When the threat model shifts faster than your review loop can honestly re-run, you don't get security, you get paperwork that pretends nothing changed.
[−] VadimPR 37d ago
I'm not one to believe the Silicon Valley hype usually (GPT-2 being too dangerous to release, AI giving us UBI, and so on), but having run Claude Opus 4.6 against my codebase (a MUD client) over the weekend, I can believe this assessment.

Opus alone did a good job of identifying security issues in my software, as it did with Firefox [1] and Linux [2]. A next-generation frontier model being able to find even more issues sounds believable.

That said, this is script kiddies vs sql injections all over again. Everyone will need to get their basic security up on the new level and it will become the new normal. And, given how intelligence agencies are sitting on a ton of zero-days already, this will actually help the general public by levelling out the playing field once again.

1 - https://www.anthropic.com/news/mozilla-firefox-security 2 - https://neuronad.com/ai-news/claude-code-unearthed-a-23-year...

[−] cryptoegorophy 38d ago
Ironically Claude cli completely failed to detect a rogue code on my html scan yesterday while ChatGPT web version detected it immediately. Can’t wait to do same test with newer version.
[−] willamhou 37d ago
One thing I keep thinking about with AI security is that most of the focus is on model behavior — alignment, jailbreaks, guardrails. But once agents start calling tools, the attack surface shifts to the execution boundary. A request can be replayed, tampered with, or sent to the wrong target, and the server often has no way to distinguish that from a legitimate call.

Cryptographic attestation at the tool-call level (sign the request, verify before execution) would close a gap that behavioral controls alone can't cover. Curious whether Glasswing's threat model includes the agent-to-tool boundary or focuses primarily on the model layer.

[−] zambelli 38d ago
I'm glad to see that it stands its ground more than other models - which is a genuinely useful trait for an assistant. Both on technical and emotional topics.
[−] DigitalArchivst 37d ago
Do folks recommend that family and friends ensure their systems are updated, and that they are using Bitwarden or 1Password? Or is that alarmist?
[−] caycep 38d ago
When do we get our Kuang Grade Mark Eleven icebreaker?
[−] tombelieber 37d ago
I think this new model will empower everyone in the world to have higher quality of software, more secure software. not less
[−] attentive 37d ago
Is there timeline mentioned anywhere on when any of this will be available for unprivileged public as in soon, not soon, never?
[−] mlvvkviz 37d ago
they built a model so powerful they won't release it. but they couldn't secure claude code from a source code leak. the model is so advanced they're paying $100M to get big tech to adopt it. the launch video reads like verified amazon reviews. the gap between the narrative and the reality is the whole story here.
[−] ahmaman 37d ago
Moving forward, wonder if such AI capabilities would widen the security gap between open-source software vs. proprietary?
[−] zb3 38d ago

> On the global stage, state-sponsored attacks from actors like China, Iran, North Korea, and Russia have threatened to compromise the infrastructure that underpins both civilian life and military readiness.

Yeah, makes sense. Those countries are bad because they execute state-sponsored cyber attacks, the US and Israel on the other hand are good, they only execute state-sponsored defense.

[−] impulser_ 38d ago
So they are only giving access to their smartest model to corporations.

You think these AI companies are really going to give AGI access to everyone. Think again.

We better fucking hope open source wins, because we aren't getting access if it doesn't.

[−] User23 37d ago
How much of Mythos’s internals will researchers be able to recover from the flood of patches?
[−] rubises 37d ago
The harder problem isn't finding vulnerabilities — it's preventing AI from violating constraints in the first place. Prompt-level safety is probabilistic. Filesystem-level constraints (mkdir 禁/behavior) are deterministic. The AI can't violate a rule that's physically encoded as a folder path in its system prompt.
[−] 5d41402abc4b 37d ago
Are there any local models that i can setup to run on my code as part of CI?
[−] wanderingmind 38d ago
So Mozilla is not part of this consortium, i'm guessing for deliberate reasons to make safari and chrome the default browsers. I don't think Firefox can survive the upcoming attacks, without robust support from foundational AI providers to secure the browser.
[−] baddash 38d ago

> security product

> glass in the name

[−] kmfrk 38d ago
Heck of a Patch Tuesday.
[−] MisterBiggs 38d ago
What happens once an agent can reliably get 100% on swebench?