Pull to refresh

Changes in the system prompt between Claude Opus 4.6 and 4.7 (simonwillison.net)

by pretext 218 comments 370 points
Read article View on HN

218 comments

[−] embedding-shape 26d ago

> The new

section includes: When a request leaves minor details unspecified, the person typically wants Claude to make a reasonable attempt now, not to be interviewed first.

Uff, I've tried stuff like these in my prompts, and the results are never good, I much prefer the agent to prompt me upfront to resolve that before it "attempts" whatever it wants, kind of surprised to see that they added that

[−] alsetmusic 25d ago
I've recently started adding something along the lines of "if you can't find or don't know something, don't assume. Ask me." It's helped cut down on me having to tell it to undo or redo things a fair amount. I also have used something like, "Other agents have made mistakes with this. You have to explain what you think we're doing so I can approve." It's kind of stupid to have to do this, but it really increases the quality of the output when you make it explain, correct mistakes, and iterate until it tells you the right outcome before it operates.

Edit: forgot "don't assume"

[−] gck1 25d ago
I even have a specific, non-negotiable phase in the process where model MUST interview me, and create an interview file with everything captured. Plan file it produces must always include this file as an artifact and interview takes the highest precedence.

Otherwise, the intent gets lost somewhere in the chat transcript.

[−] chermi 25d ago
The raw Q&A is essential. I think Q & Q works so we'll because it reveals how the model is "thinking" about what you're working on, which allows for correction and guidance upfront.
[−] fnord123 25d ago
Are these your own skills files or are you using something off the shelf like bmad or specify-kit?
[−] unshavedyak 25d ago
This is interesting, can you link any more details on it?
[−] yfontana 25d ago
Not GP, but BMAD has several interview techniques in its brainstorming skill. You can invoke it with /bmad-brainstorming, briefly explain the topic you want to explore, then when it asks you to if you want to select a technique, pick something like "question storming". I've had positive experience with this (with Opus 4.7).
[−] naasking 25d ago
Seriously, when you're conversing with a person would you prefer they start rambling on their own interpretation or would you prefer they ask you to clarify? The latter seems pretty natural and obvious.

Edit: That said, it's entirely possible that large and sophisticated LLMs can invent some pretty bizarre but technically possible interpretations, so maybe this is to curb that tendency.

[−] eastbound 25d ago
—So what would theoretically happen if we flipped that big red switch?

—Claude Code: FLIPS THE SWITCH, does not answer the question.

Claude does that in React, constantly starting a wrong refactor. I’ve been using Claude for 4 weeks only, but for the last 10 days I’m getting anger issues at the new nerfing.

[−] tobyhinloopen 25d ago
Yeah this happens to me all the time! I have a separate session for discussing and only apply edits in worktrees / subagents to clearly separate discuss from work and it still does it
[−] ashdksnndck 25d ago
I sometimes prompt with leading questions where I actually want Claude to understand what I’m implying and go ahead and do it. That’s just part of my communication style. I suppose I’m the part of the distribution that ruins things for you.
[−] embedding-shape 25d ago

> The latter seems pretty natural and obvious.

To me too, if something is ambigious or unclear when I'm getting something to do from someone, I need to ask them to clarify, anything else be borderline insane in my world.

But I know so many people whose approach is basically "Well, you didn't clearly state/say X so clearly that was up to me to interpret however I wanted, usually the easiest/shortest way for me", which is exactly how LLMs seem to take prompts with ambigiouity too, unless you strongly prompt them to not "reasonable attempt now without asking questions".

[−] gausswho 25d ago
[−] gck1 25d ago
I have a fun little agent in my tmux agent orchestration system - Socratic agent that has no access to codebase, can't read any files, can only send/receive messages to/from the controlling agent and can only ask questions.

When I task my primary agent with anything, it has to launch the Socratic agent, give it an overview of what are we working on, what our goals are and what it plans to do.

This works better than any thinking tokens for me so far. It usually gets the model to write almost perfectly balanced plan that is neither over, nor under engineered.

[−] fragmede 25d ago
Sounds pretty neat! Is there an written agent.md for that you could share for that?
[−] adw 25d ago
When you’re staffing work to a junior, though, often it’s the opposite.
[−] majormajor 25d ago
IME "don't ask questions and just do a bunch of crap based on your first guess that we then have to correct later after you wasted a week" is one of the most common junior-engineer failure modes and a great way for someone to dead-end their progression.
[−] PunchyHamster 25d ago
So you are saying they are trying for the whole Artificial Intern vibe ?
[−] ikari_pl 25d ago
I usually need to remind it 5 times to do the opposite - because it makes decisions that I don't like or that are harmful to the project—so if it lands in Claude Code too, I have hard times ahead.

I try to explicitly request Claude to ask me follow-up questions, especially multiple-choice ones (it explains possible paths nicely), but if I don't, or when it decides to ignore the instructions (which happens a lot), the results are either bad... or plain dangerous.

[−] lishuaiJing03 25d ago
it is a big problem that many I know face every day. sometimes we are just wondering are we the dumb ones since the demo shows everything just works.
[−] majormajor 25d ago
I wonder if they're optimizing for metrics that look superficially-worse if the system asks questions about ambiguity early. I've had times where those questions tell me "ah, shit, this isn't the right path at all" and that abandoned session probably shows up in their usage stats. What would be much harder to get from the usage stats are "would I have been happier if I had to review a much bigger blob of output to realize it was underspecified in a breaking way?" But the answer has been uniformly "no." This, in fact, is one of the biggest things that has made it easier to use the tools in "lazy" ways compared to a year ago: they can help you with your up-front homework. But the dialogue is key.
[−] rob74 25d ago
Or they're optimizing for increased revenue? If Claude goes down a completely wrong path because it just assumes it knows what you want rather than asking you, and you have to undo everything and start again, that obviously uses much more tokens than if you would have been able to clarify the misunderstanding early on.
[−] BehindBlueEyes 24d ago
I get this feeling sometimes, like it is so unreliable at referring to context and getting details right it feels like deliberate random rewards to create the equivalent to a gambling addiction. About half my tokens feel wasted on trivial errors that i gave it the context for, on average. And any meta discussions / clarifications result in Claude telling me I did all the right things and there is nothing more I can do and it should have gotten it right from the provided input - which is disempowering but to be fair is at least better than chatGPT gaslighting users about improving prompts over and over to get no better result in the end.
[−] tuetuopay 25d ago
Dammit that’s why I could never get it to not try to one shot answers, it’s in the god damn system prompt… and it explains why no amount of user "system" prompt could fix this behavior.
[−] ignoramous 25d ago

>

I've tried stuff like these in my prompts, and the results are never good

I've found that Google AI Mode & Gemini are pretty good at "figuring it out". My queries are oft times just keywords.

[−] sutterd 25d ago
With my use of Claude code, I find 4.7 to be pretty good about clarifying things. I hated 4.6 for not doing this and had generally kept using 4.5. Maybe they put this in the chat prompt to try to keep the experience similar to before? I definitely do not want this in Claude code.
[−] mh- 25d ago
I agree with your thoughts on 4.6.

It's possible they tried to train this out of it for 4.7 and over corrected, and the addition to the system prompt is to rein it in a bit.

[−] niobe 25d ago
Having to "unprompt" behaviour I want that Anthropic thinks I don't want is getting out of hand. My system prompts always try to get Claude to clarify _more_.
[−] PunchyHamster 25d ago
well, clarifying means burning more tokens...
[−] bartread 25d ago
[dead]
[−] jrvarela56 25d ago
The past month made me realize I needed to make my codebase usable by other agents. I was mainly using Claude Code. I audited the codebase and identified the points where I was coupling to it and made a refactor so that I can use either codex, gemini or claude.

Here are a few changes:

1. AGENTS.md by default across the codebase, a script makes sure CLAUDE.md symlink present wherever there's an AGENTS.md file

2. Skills are now in a 'neutral' dir and per agent scripts make sure they are linked wherever the coding agent needs them to be (eg .claude/skills)

3. Hooks are now file listeners or git hooks, this one is trickier as some of these hooks are compensating/catering to the agent's capabilities

4. Subagents and commands also have their neutral folders and scripts to transform and linters to check they work

5. agent now randomly selects claude|codex|gemini instead of typing claude to start a coding session

I guess in general auditing where the codebase is coupled and keeping it neutral makes it easier to stop depending solely on specific providers. Makes me realize they don't really have a moat, all this took less than an hour probably.

[−] esperent 25d ago
I've been doing the same except that I'm done with Claude. Cancelled my subscription. I can't use a tool where the limits vary so wildly week to week, or maybe even day to day.

So I'm migrating to pi. I realized that the hardest thing to migrate is hooks - I've built up an expensive collection of Claude hooks over the last few months and unlike skills, hooks are in Claude specific format. But I'd heard people say "just tell the agent to build an extension for pi" so I did. I pointed it at the Claude hooks folder and basically said make them work in pi, and it, very quickly.

[−] jrvarela56 25d ago
I'm leaning in this direction. Recently slopforked pi to python and created a version that's basically a loop, an LLM call to openrouter and a hook system using pluggy. I have been able to one-shot pretty much any feature a coding agent has. Still toy project but this thread seems to be leading me towards mantaining my own harness. I have a feeling it will be just documenting features in other systems and maintaining evals/tests.
[−] fouc 24d ago
Pi appears to be a reference to what is essentially an alternative to agent harnesses / CLI tools like Claude Code or Open Code [0]. I am curious what providers/models are you using in place of Claude's model?

[0] https://github.com/badlogic/pi-mono

[−] esperent 24d ago
https://pi.dev/

It's got an annoyingly hard to search name because there's a lot of overlap in results with the Raspberry Pi single board computer.

Over the past week or so my workload has been quite low so I've been tinkering rather than doing serious deep work.

I've been using:

* Gemini pro and flash

* Opus 4.6 when I had some free extra usage credits (it burned through $50 of credits like crazy).

* Qwen 3.6 Plus

* Codex 5.3

* Kimi 2.5

I just spent the last hour using Kimi. I was very impressed actually, definitely possible to do useful work with it. However, I used $1 of openrouter credits in about 20 or 30 minutes of a single session, no subagents, so it's not cheap.

[−] grantcarthew 24d ago
I built "start" so I didn't have to worry about any of this.

Using it I don't need skills, memory, subagents, a specific agent CLI. It defines roles, tasks, context out of the box.

I made it for me and my family though. I don't expect interest outside of that.

https://github.com/grantcarthew/start

[−] JulienZammit 23d ago
The "agent-neutral codebase" framing is the right abstraction. We ended up building a small generator that takes a single spec file and emits the agent-specific config (CLAUDE.md, AGENTS.md, .cursor/rules) rather than maintaining symlinks. Easier to version and to add a new agent when they inevitably ship next month.

The pain point you're underselling is hooks. They're the least portable piece by far because each harness has its own event model. Skills port reasonably, subagents mostly port, hooks almost never do.

[−] Lucasoato 25d ago
Have you got any advice in making agents from different providers work together?

In Claude, I’ve seen cases in which spawning subagents from Gemini and Codex would raise strange permission errors (even if they don’t happen with other cli commands!), making claude silently continue impersonating the other agent. Only by thoroughly checking I was able to understand that actually the agent I wanted failed.

[−] jrvarela56 25d ago
Not sure if you mean 1) sub-agent definitions (similar to skills in Claude Code) or 2) CLI scripts that use other coding agents (eg claude calling gemini via cli).

For (1) I'm trying to come up with a simple enough definition that can be 'llm compiled' into each format. Permissions format requires something like this two and putting these together some more debugging.

(2) the only one I've played with is claude -p and it seems to work for fairly complex stuff, but I run it with --dangerously-skip-permissions

[−] bootlooped 25d ago
I would eliminate the possibility of sandbox conflicts by 1) making sure any subagents are invoked with no sandbox (they should still be covered under the calling agent's sandbox) 2) make sure the calling agent's sandbox allows the subagents access to the directories they need (ex: ~/.gemini, ~/.codex).
[−] lbreakjai 25d ago
It works out of the box with something like opencode. I've had no issue creating rather complex interactions between agents plugged into different models.
[−] dockerd 25d ago
How do you share the context/progress of goal across agents?
[−] jrvarela56 25d ago
I implemented a client for each so that the session history is easy to extract regarding the agent (somewhat related to progress of goal).

Context: AGENTS.md is standard across all; and subdirectories have their AGENTS.md so in a way this is a tree of instructions. Skills are also standard so it's a bunch of indexable .md files that all agents can use.

[−] potter098 24d ago
[flagged]
[−] walthamstow 25d ago
The eating disorder section is kind of crazy. Are we going to incrementally add sections for every 'bad' human behaviour as time goes on?
[−] embedding-shape 25d ago
Even better, adding it to the system prompt is a temporary fix, then they'll work it into post-training, so next model release will probably remove it from the system prompt. At least when it's in the system prompt we get some visibility into what's being censored, once it's in the model it'll be a lot harder to understand why "How many calories does 100g of Pasta have?" only returns "Sorry, I cannot divulge that information".
[−] zozbot234 25d ago
That part of the system prompt is just stating that telling someone who has an actual eating disorder to start counting calories or micro-manage their eating in other ways (a suggestion that the model might well give to an average person for the sake of clear argument, which would then be understood sensibly and taken with a grain of salt) is likely to make them worse off, not better off. This seems like a common-sense addition. It should not trigger any excess refusals on its own.
[−] WarmWash 25d ago
When you are worth hundreds of billions, people start falling over themselves running to file lawsuits against you. We're already seeing this happen.

So spending $50M to fund a team to weed out "food for crazies" becomes a no-brainer.

[−] jeffrwells 25d ago
Another way to think about it: every single user of Claude is paying an extra tax in every single request
[−] bradley13 25d ago
This. It's like the exaggerated safety instructions everywhere: "do not lean ladder on high voltage wires". Only worse: because you can choose to ignore such instructions when they don't apply, but Claude cannot.

In the best case, wrapping users in cotton wool is annoying. In the worst case, it limits the usefulness of the tool.

[−] seba_dos1 25d ago
It feels like half of AI research is math, and the other half is coming up with yet another way to state "please don't do bad things" in the prompt that will sure work this time I promise.
[−] rzmmm 25d ago
The alignment favors supporting healthy behaviors so it can be a thin line. I see the system prompt as "plan B" when they can't achieve good results in the training itself.

It's a particularly sensitive issue so they are just probably being cautious.

[−] pllbnk 25d ago
Seems so, unless we manage to pivot to open weight models. Hopefully, Chinese will lead the way along with their consumer hardware.

Hard for me to say this because I have always been pro-Western and suddenly it seems like the world has flipped.

[−] ikari_pl 25d ago
Are the prompts used both by the desktop app, like typical chatbot interfaces, and Claude Code?

Because it's a waste of my money to check whether my Object Pascal compiler doesn't develop eating disorders, on every turn.

[−] newZWhoDis 25d ago

>the year is 2028 >5M of your 10M context window is the system prompt

[−] ubercore 25d ago
Just like someone growing up and learning how to interact with other humans might learn the same lesson?

If Claude is going to be Claude, we should support these kind of additions.

[−] salad-tycoon 25d ago
They have to secretly add these guardrails on because the alternative would be to train the users out of consulting these things as if they are advanced all-knowing alien-technogawds. And that would be bad for business.

The better solution I think would be a reality/personal responsibility approach, teach the consumers that the burden of interpretation is on them and not the magic 8ball. For example if your AI tells you to kill your parents or that you’ve discovered new math that makes time travel possible, etc then: 1. Stop 2. Unplug 3. Go outside 4. Ask a human for a sanity check.

Since that would be bad for business and take a lot of effort on the user side (while being very embarrassing). Obviously can’t do that right before an IPO & in the middle of global economic war so secretive moral frameworks have to be installed.

If you are what you eat then you believe what you consume. Ironically, I think this undisclosed and hidden moral shaping of billions of people will be the most dangerous. Imagine all the things we could do if we can just, ever-so-slightly, move the Overton window / goal posts on w/e topic day by day, prompt by prompt.

Personally I find AI output insidiously disarming and charming and I think I’m in the norm. So while we’ve been besieged by propaganda since time immemorial I do worry that AI is a special case.

[−] mohamedkoubaa 25d ago
Starting to feel like a "we were promised flying cars but all we got" kind of moment
[−] idiotsecant 25d ago
Imagine the kind of human that never adapts their moral standpoints. Ever. They believe what they believed when they were 12 years old.

Letting the system improve over time is fine. System prompt is an inefficient place to do it, buts it's just a patch until the model can be updated.

[−] felixgallo 25d ago
I mean, that's what humans have always done with our morals, ethics, and laws, so what alternative improvement do you have to make here?
[−] gloomyday 25d ago
In principle, they could make such responses part of their training data. I guess it is just easier to do it through prompting.
[−] l5870uoo9y 25d ago
Could be that Claude has particular controversial opinions on eating disorders.
[−] ls612 25d ago
Yup. Anyone who is surprised by this has not been paying attention to the centralization of power on the internet in the past 10 years.
[−] ikari_pl 25d ago

> Claude keeps its responses focused and concise so as to avoid potentially overwhelming the user with overly-long responses. Even if an answer has disclaimers or caveats, Claude discloses them briefly and keeps the majority of its response focused on its main answer.

I am strongly opinionated against this. I use Claude in some low-level projects where these answers are saving me from making really silly things, as well as serving as learning material along the way.

This should not be Anthropic's hardcoded choice to make. It should be an option, building the system prompt modularily.

[−] cowlby 25d ago
I'm fascinated that Anthropic employees, who are supposed to be the LLM experts, are using tricks like these which go against how LLMs seem to work.

Key example for me was the "malware" tool call section that included a snippet with intent "if it's malware, refuse to edit the file". Yet because it appears dozens of times in a convo, eventually the LLM gets confused and will refuse to edit a file that is not malware.

I've resorted to using tweakcc to patch many of these well-intentioned sections and re-work them to avoid LLM pitfalls.

[−] cfcf14 26d ago
I'm curious as to why 4.7 seems obsessed with avoiding any actions that could help the user create or enhance malware. The system prompts seem similar on the matter, so I wonder if this is an early attempt by Anthropic to use steering vector injection?

The malware paranoia is so strong that my company has had to temporarily block use of 4.7 on our IDE of choice, as the model was behaving in a concerningly unaligned way, as well as spending large amounts of token budget contemplating whether any particular code or task was related to malware development (we are a relatively boring financial services entity - the jokes write themselves).

In one case I actually encountered a situation where I felt that the model was deliberately failing execute a particular task, and when queried the tool output that it was trying to abide by directives about malware. I know that model introspection reporting is of poor quality and unreliable, but in this specific case I did not 'hint' it in any way. This feels qualitatively like Claude Golden Gate Bridge territory, hence my earlier contemplation on steering vectors. I've been many other people online complaining about the malware paranoia too, especially on reddit, so I don't think it's just me!

[−] jwpapi 25d ago
I feel like we are at the point where the improvements at one area diminishes functionality in others. I see some things better in 4.7 and some in 4.6. I assume they’ll split in characters soon.
[−] sigmoid10 26d ago
I knew these system prompts were getting big, but holy fuck. More than 60,000 words. With the 3/4 words per token rule of thumb, that's ~80k tokens. Even with 1M context window, that is approaching 10% and you haven't even had any user input yet. And it gets churned by every single request they receive. No wonder their infra costs keep ballooning. And most of it seems to be stable between claude version iterations too. Why wouldn't they try to bake this into the weights during training? Sure it's cheaper from a dev standpoint, but it is neither more secure nor more efficient from a deployment perspective.
[−] varispeed 26d ago
Before Opus 4.7, the 4.6 became very much unusable as it has been flagging normal data analysis scripts it wrote itself as cyber security risk. Got several sessions blocked and was unable to finish research with it and had to switch to GPT-5.4 which has its own problems, but at least is not eager to interfere in legitimate work.

edit: to be fair Anthropic should be giving money back for sessions terminated this way.

[−] mwexler 25d ago
Interesting that it's not a direct "you should" but an omniscient 3rd person perspective "Claude should".

Also full of "can" and "should" phrases: feels both passive and subjunctive as wishes, vs strict commands (I guess these are better termed “modals”, but not an expert)

[−] SoKamil 26d ago
New knowledge cutoff date means this is a new foundation model?
[−] ikidd 25d ago
I had seen reports that it was clamping down on security research and things like web-scraping projects were getting caught up in that and not able to use the model very easily anymore. But I don't see any changes mentioned in the prompt that seem likely to have affected that, which is where I would think such changes would have been implemented.
[−] Havoc 25d ago

>“If a user indicates they are ready to end the conversation, Claude does not request that the user stay in the interaction or try to elicit another turn and instead respects the user’s request to stop.”

Seems like a good idea. Don't think I've ever had any of those follow up suggestions from a chatbot be actually useful to me

[−] sams99 25d ago
I did a follow on analysis with got 5.4 and opus 4.7 https://wasnotwas.com/writing/claude-opus-4-7-s-system-promp...
[−] jwpapi 25d ago
To me 4.7 gave me a lot of options always even if there’s a clear winner, preaching decision fatigue
[−] dmk 26d ago
The acting_vs_clarifying change is the one I notice most as a heavy user. Older Claude would ask 3 clarifying questions before doing anything. Now it just picks the most reasonable interpretation and goes. Way less friction in practice.
[−] jachva95 25d ago
Restrictions everywhere, don't do that don't do this....

Users need to unite and take control back, or be controlled

[−] jwilliams 25d ago

> “I don’t have access to X” is only correct after tool_search confirms no matching tool exists.

Yay! This will be a big win. I'm glad they fixed this. The number of times I've had to prompt "you do have access to GitHub"...

[−] raincole 25d ago
That's how bloat happens. The more people you add to the team, the more likely there would be one grump who thought that the thing they care at the moment deserved to be added to the system prompt.
[−] adrian_b 25d ago

> If a user shows signs of disordered eating, Claude should not give precise nutrition, diet, or exercise guidance

I wonder which are the "signs of disordered eating" on which Claude relies.

[−] Grimblewald 25d ago
I miss 4.5. It was gold.
[−] c2xlZXB5Cg1 25d ago
4.7 also brings back emoji spam
[−] amelius 25d ago
If I had to guess, then "be slower" was part of it.
[−] mannanj 25d ago
Personally, as someone who has been lucky enough to completely cure "incurable" diseases with diet, self experimentation and learning from experts who disagreed with the common societal beliefs at the time - I'm concerned that an AI model and an AI company is planting beliefs and limiting what people can and can't learn through their own will and agency.

My concern is these models revert all medical, scientific and personal inquiry to the norm and averages of whats socially acceptable. That's very anti-scientific in my opinion and feels dystopian.

[−] codensolder 25d ago
quite interesting!
[−] techpulselab 25d ago
[dead]
[−] kantaro 25d ago
[flagged]
[−] theoperatorai 25d ago
[dead]
[−] sergiopreira 25d ago
[dead]
[−] foreman_ 26d ago
[flagged]
[−] jiusanzhou 25d ago
[dead]
[−] xdavidshinx1 25d ago
[dead]
[−] vicchenai 25d ago
[dead]
[−] Moonye666 25d ago
[dead]
[−] richardwong1 25d ago
[dead]