1M context is now generally available for Opus 4.6 and Sonnet 4.6 (claude.com)

by meetpateltech 519 comments 1220 points
Read article View on HN

519 comments

[−] jeremychone 63d ago
Interesting, I’ve never needed 1M, or even 250k+ context. I’m usually under 100k per request.

About 80% of my code is AI-generated, with a controlled workflow using dev-chat.md and spec.md. I use Flash for code maps and auto-context, and GPT-4.5 or Opus for coding, all via API with a custom tool.

Gemini Pro and Flash have had 1M context for a long time, but even though I use Flash 3 a lot, and it’s awesome, I’ve never needed more than 200k.

For production coding, I use

- a code map strategy on a big repo. Per file: summary, when_to_use, public_types, public_functions. This is done per file and saved until the file changes. With a concurrency of 32, I can usually code-map a huge repo in minutes. (Typically Flash, cheap, fast, and with very good results)

- Then, auto context, but based on code lensing. Meaning auto context takes some globs that narrow the visibility of what the AI can see, and it uses the code map intersection to ask the AI for the proper files to put in context. (Typically Flash, cheap, relatively fast, and very good)

- Then, use a bigger model, GPT 5.4 or Opus 4.6, to do the work. At this point, context is typically between 30k and 80k max.

What I’ve found is that this process is surprisingly effective at getting a high-quality response in one shot. It keeps everything focused on what’s needed for the job.

Higher precision on the input typically leads to higher precision on the output. That’s still true with AI.

For context, 75% of my code is Rust, and the other 25% is TS/CSS for web UI.

Anyway, it’s always interesting to learn about different approaches. I’d love to understand the use case where 1M context is really useful.

[−] daemonk 63d ago
Yeah this is the simpler and also effective strategy. A lot of people are building sophisticated AST RAG models. But you really just need to ask Claude to generally build a semantic index for each large-ish piece of code and re-use it when getting context.

You have to make sure the semantic summary takes up significantly less tokens than just reading the code or its just a waste of token/time.

Then have a skill that uses git version logs to perform lazy summary cache when needed.

[−] smusamashah 63d ago
It seems like a very good use of LLMs. You should write a blog post with detail of your process with examples for people who are not into all AI tools as much. I only use Web UI. Lots of what you are saying is beyond me, but it does sound like clever strategy.
[−] tontinton 63d ago
Yeah we all converge to the same workflow, in my ai coding agent I'm working on now, I've added an "index" tool that uses tree-sitter to compress and show the AI a skeleton of a code file.

Here's the implementation for the interested: https://github.com/tontinton/maki/blob/main/maki-code-index%...

[−] gck1 49d ago
I'm curious, what does your workflow look like? I saw a plan prompt there, but no specs. When you want to change something, implement a new feature etc, do you just prompt requirements, have it write the plan and then have it work on it?
[−] jeremychone 63d ago
Oh, that's great.

I've always wanted to explore how to fit tree-sitter into this workflow. It's great to know that this works well too.

Thanks for sharing the code.

(Here is the AIPack runtime I built, MIT: https://github.com/aipack-ai/aipack), and here is the code for pro@coder (https://github.com/aipack-ai/packs-pro/tree/main/pro/coder) (AIPack is in Rust, and AI Packs are in md / lua)

[−] firemelt 63d ago
whenever I see post like this

i said well yeah, but its too sophiscated to be practical

[−] jeremychone 63d ago
Fair point, but because I spent a year building and refining my custom tool, this is now the reality for all of my AI requests.

I prompt, press run, and then I get this flow: dev setup (dev-chat or plan) code-map (incremental 0s 2m for initial) auto-context (~20s to 40s) final AI query (~30s to 2m)

For example, just now, in my Rust code (about 60k LOC), I wanted to change the data model and brainstorm with the AI to find the right design, and here is the auto-context it gave me:

- Reducing 381 context files ( 1.62 MB)

- Now 5 context files ( 27.90 KB)

- Reducing 11 knowledge files ( 30.16 KB)

- Now 3 knowledge files ( 5.62 KB)

The knowledge files are my "rust10x" best practices, and the context files are the source files.

(edited to fix formatting)

[−] tjoff 62d ago
How do you re-evaluate your approach? I'm asking because the landscape, at least from my lens, was completely different a year ago. So I fear that as the foundation shifts whatever learnings, approaches and mental models I have risk being obsolete and starts to work against me.

The problem of evaluating is hard enough as it is without layers of indirection built on top of it.

[−] adammarples 63d ago
It's not sophisticated at all, he just uses a model to make some documentation before asking another model to work using the documentation
[−] lukeundtrug 63d ago
I built myself an AST based solution for that during the last 6 months roughly. I always wondered whether grep and agent-based discovery will be the end of it and thought it just has to be better with a more deterministic approach.

In the end it's hard to measure but personally I feel that my agent rarely misses any context for a given task, so I'm pretty happy with it.

I used a different approach than tree-sitter because I thought I found a nice way to get around having to write language-specific code. I basically use VSCode as a language backend and wrote some logic to basically rebuild the AST tree from VSCode's symbol data and other API.

That allows me to just install the correct language extension and thus enable support for that specific language. The extension has to provide symbol information which most do through LSP.

In the end it was way more effort than just using tree-sitter, however, and I'm thinking of doing a slow migration to that approach sooner or later.

Anyways, I created an extension that spins up an mcp server and provides several tools that basically replace the vanilla discovery tools in my workflow.

The approach is similar to yours, I have an overview tool which runs different centrality ranking metrics over the whole codebase to get the most important symbols and presents that as an architectural overview to the LLM.

Then I have a "get-symbol-context" tool which allows the AI to get all the information that the AST holds about a single symbol, including a parameter to include source code which completely replaces grepping and file reading for me.

The tool also specifies which other symbols call the one in question and which others it calls, respectively.

But yeah, sorry for this being already a quite long comment, if you want to give it a try, I published it on the VSCode marketplace a couple of days ago, and it's basically free right now, although I have to admit that I still want to try to earn a little bit of money with it at some point.

Right now, the daily usage limit is 2000 tool calls per day, which should be enough for anybody.

Would love to hear what you think :)

<https://marketplace.visualstudio.com/items?itemName=LuGoSoft...>

[−] jeremychone 62d ago
I looked at your solution and extension README, and it's very interesting and well thought out.

The fact that you've been using it for six months and that it performs well says a lot. At the end of the day, that's what counts.

I like your idea of piggybacking on top of the LSP services, and I can imagine that this was quite a bit of work. Doing it as an MCP server makes it usable across different tools.

I also really like the name "Context Master."

In my case, it's much more niche since it's for the tool I built. Though it's open source, the key difference is that the "indexing" is only agentic at this point.

I can see value in mixing the two. LSP integration scares me because of the amount of work involved, and tree-sitter seems like a good path.

In that case, in the code map, for each item, there could be both the LLM response info and some deterministic info, for example, from tree-sitter.

That being said, the current approach works so well that I think I am going to keep using and fine-tuning it for a while, and bring in deterministic context only when or if I need it.

Anyway, what you built looks great. If it works, that's great.

[−] lukeundtrug 61d ago
Thanks for taking the time to check it out and for the kind words! I really appreciate it.

I totally get sticking with your current approach. Your workflow sounds very intriguing as well. A combination of both approaches might really be very interesting :) Adding an LLM interpretation layer on top of my graph is also something I'm actively considering.

Thanks for the great discussion, and best of luck with your tool and workflow!

[−] cloverich 63d ago
This is really interesting; ive done very high level code maps but the entire project seems wild, it works?

So, small model figures out which files to use based on the code map, and then enriches with snippets, so big model ideally gets preloaded with relevant context / snippets up front?

Where does code map live? Is it one big file?

[−] jeremychone 63d ago
So, I have a pro@coder/.cache/code-map/context-code-map.json.

I also have a .tmpl-code-map.jsonl in the same folder so all of my tasks can add to it, and then it gets merged into context-code-map.json.

I keep mtime, but I also compute a blake3 hash, so if mtime does not match, but it is just a "git restore," I do not redo the code map for that file. So it is very incremental.

Then the trick is, when sending the code map to AI, I serialize it in a nice, simple markdown format.

- path/to/file.rs - summary: ... - when to use: ... - public types: .., .., .. - public functions: .., .., ..

- ...

So the AI does not have to interpret JSON, just clean, structured markdown.

Funny, I worked on this addition to my tool for a week, planning everything, but even today, I am surprised by how well it works.

I have zero sed/grep in my workflow. Just this.

My prompt is pro@coder/coder-prompt.md, the first part is YAML for the globs, and the second part is my prompt.

There is a TUI, but all input and output are files, and the TUI is just there to run it and see the status.

[−] CuriouslyC 63d ago
1M context is super useful with Gemini, not so much for coding, but for data analysis.
[−] jeremychone 63d ago
Even there, I use AI to augment rows and build the code to put data into Json or Polars and create a quick UI to query the data.
[−] speakbits 63d ago
I think you've kind of hit on the more successful point here, which is that you should be keeping things focused in a sufficiently focused area to have better success and not necessarily needing more context.
[−] exceptione 63d ago

  > - a code map strategy on a big repo. Per file: summary, when_to_use, public_types, public_functions. This is done per file and saved until the file changes. With a concurrency of 32, I can usually code-map a huge repo in minutes. (Typically Flash, cheap, fast, and with very good results)

Thanks, but why use any AI to generate this? I would say: you document your functions-in-code, types are provided from the compiler service, so it should all be deterministically available in seconds iso minutes, without burning tokens. Am I missing something?
[−] jeremychone 63d ago
Very good point. I had two options:

1) Deterministic

  - Using a tree-sitter/AST-like approach, I could extract types, functions, and perhaps comments, and put them into an index map.

  - Cons:

    - The tricky part of this approach is that what I extract can be pretty large per file, for example, comments.

    - Then, I would probably need an agentic synthesis step for those comments anyway.
2) Agentic

  - Since Flash is dirt cheap, I wanted to experiment and skip #1, and go directly to #2.

  - Because my tool is built for concurrency, when set to 32, it's super fast.

  - The price is relatively low, perhaps $1 or $2 for 50k LOC, and 60 to 90 seconds, about 30 to 45 minutes of AI work.

  - What I get back is relatively consistent by file, size-wise, and it's just one trip per file.

So, this is why I started with #2.

And then, the results in real coding scenarios have been astonishing.

Way above what I expected.

The way those indexes get combined with the user prompt gets the right files 95% of the time, and with surprisingly high quality.

So, I might add deterministic aspects to it, but since I think I will need the agentic step anyway, I have deprioritized it.

[−] rafael-lua 63d ago
Well, out of all the workflows I have seen, this one is rather nice, might give it a try.

I imagine if the context were being commited and kept up-to-date with CI would work for others to use as well.

However, I'm a little confused on the autocontext/globs narrowing part. Do you, the developer, provide them? Or you feed the full code map to flash + your prompt so it returns the globs based on your prompt?

Also, in general, is your map of a file relatively smaller than the file itself, even for very small files?

[−] LuxBennu 63d ago
Your code map compresses signal on the context side. Same principle applies on the prompt side: prompts that front-load specifics (file, error, expected behavior) resolve in 1-2 turns. Vague ones spiral into 5-6. 1M context doesn't change that — it just gives you more room for the spiral.
[−] Myrmornis 63d ago
This is interesting but don't you worry that you're competing with entire companies (e.g. Anthropic) and thus it's a losing battle? Since you're re-implementing a bunch of stuff they either do in their harness or have decided it was better not to do?
[−] ra7 63d ago
This is fascinating. I feel like this is converging into the concept of a traditional "IDE". So much of your setup reminds me of IDEs indexing, doing static analysis, building ASTs, etc. before a developer starts writing code.
[−] Weryj 62d ago
My approach has been using static analysis to produce a Mermaid diagram of all Classes:Methods and their caller/callees.
[−] make_it_sure 63d ago
very interested in this approach and many other people are for sure. Please do a blog post.
[−] dimitri-vs 64d ago
The big change here is:

> Standard pricing now applies across the full 1M window for both models, with no long-context premium. Media limits expand to 600 images or PDF pages.

For Claude Code users this is huge - assuming coherence remains strong past 200k tok.

[−] syntaxing 64d ago
It’s interesting because my career went from doing higher level language (Python) to lower language (C++ and C). Opus and the like is amazing at Python, honestly sometimes better than me but it does do some really stupid architectural decisions occasionally. But when it comes to embedded stuff, it’s still like a junior engineer. Unsure if that will ever change but I wonder if it’s just the quality and availability of training data. This is why I find it hard to believe LLMs will replace hardware engineers anytime soon (I was a MechE for a decade).
[−] convenwis 64d ago
Is there a writeup anywhere on what this means for effective context? I think that many of us have found that even when the context window was 100k tokens the actual usable window was smaller than that. As you got closer to 100k performance degraded substantially. I'm assuming that is still true but what does the curve look like?
[−] wewewedxfgdf 64d ago
The weirdest thing about Claude pricing is their 5X pricing plan is 5 times the cost of the previous plan.

Normally buying the bigger plan gives some sort of discount.

At Claude, it's just "5 times more usage 5 times more cost, there you go".

[−] minimaxir 64d ago
Claude Code 2.1.75 now no longer delineates between base Opus and 1M Opus: it's the same model. Oddly, I have Pro where the change supposedly only for Max+ but am still seeing this to be case.

EDIT: Don't think Pro has access to it, a typical prompt just hit the context limit.

The removal of extra pricing beyond 200k tokens may be Anthropic's salvo in the agent wars against GPT 5.4's 1M window and extra pricing for that.

[−] Frannky 64d ago
Opus 4.6 is nuts. Everything I throw at it works. Frontend, backend, algorithms—it does not matter.

I start with a PRD, ask for a step-by-step plan, and just execute on each step at a time. Sometimes ideas are dumb, but checking and guiding step by step helps it ship working things in hours.

It was also the first AI I felt, "Damn, this thing is smarter than me."

The other crazy thing is that with today's tech, these things can be made to work at 1k tokens/sec with multiple agents working at the same time, each at that speed.

[−] vessenes 64d ago
This is super exciting. I've been poking at it today, and it definitely changes my workflow -- I feel like a full three or four hour parallel coding session with subagents is now generally fitting into a single master session.

The stats claim Opus at 1M is about like 5.4 at 256k -- these needle long context tests don't always go with quality reasoning ability sadly -- but this is still a significant improvement, and I haven't seen dramatic falloff in my tests, unlike q4 '25 models.

p.s. what's up with sonnet 4.5 getting comparatively better as context got longer?

[−] johnwheeler 64d ago
This is incredible. I just blew through $200 last night in a few hours on 1M context. This is like the best news I've heard all year in regards to my business.

What is OpenAIs response to this? Do they even have 1M context window or is it still opaque and "depends on the time of day"

[−] iandanforth 63d ago
I'm very happy about this change. For long sessions with Claude it was always like a punch to the gut when a compaction came along. Codex/GPT-5.4 is better with compactions so I switched to that to avoid the pain of the model suddenly forgetting key aspects of the work and making the same dumb errors all over again. I'm excited to return to Claude as my daily driver!
[−] tariky 63d ago
This is amazing. I have to test it with my reverse engineering workflow. I don't know how many people use CC for RE but it is really good at it.

Also it is really good for writing SketchUp plugins in ruby. It one shots plugins that are in some versions better then commercial one you can buy online.

CC will change development landscape so much in next year. It is exciting and terrifying in same time.

[−] aragonite 64d ago
Do long sessions also burn through token budgets much faster?

If the chat client is resending the whole conversation each turn, then once you're deep into a session every request already includes tens of thousands of tokens of prior context. So a message at 70k tokens into a conversation is much "heavier" than one at 2k (at least in terms of input tokens). Yes?

[−] geminiboy 62d ago
My companies brand guidelines document was 600 ish pages long and claude desktop couldnt handle it.

As soon as I saw the announcement , tried again and created a working design skill that can create design artifacts following the brand guidelines.

While these improvements seem incremental, they have a compounding effect on usefulness.

My AI doomsday calculator just got decremented by anothet 6 months.

[−] Slav_fixflex 63d ago
I've been using Claude Code directly on my production servers to debug complex I/O bottlenecks and database locks. The ability of the latest models to hold the entire project context while suggesting real-time fixes is a game changer for solo founders. It helped me stabilize a security tool I’m building when other agents kept hallucinating.
[−] anshumankmr 63d ago
All while their usage limits are so excessively shitty that I paid them 50$ just two days back cause I ran out of usage and they still blocked from using it during a critical work week (and did not refund my 50$ despite my emails and requests and route me to s*ty AI bot.). Anyway, I am using Copilot and OpenCode a lot more these days which is much better.
[−] pixelpoet 64d ago
Compared to yesterday my Claude Max subscription burns usage like absolutely crazy (13% of weekly usage from fresh reset today with just a handful prompts on two new C++ projects, no deps) and has become unbearably slow (as in 1hr for a prompt response). GGWP Anthropic, it was great while it lasted but this isn't worth the hundreds of dollars.
[−] LoganDark 64d ago
Finally, I don't have to constantly reload my Extra Usage balance when I already pay $200/mo for their most expensive plan. I can't believe they even did that. I couldn't use 1M context at all because I already pay $200/mo and it was going to ask me for even more.

Next step should be to allow fast mode to draw from the $200/mo usage balance. Again, I pay $200/mo, I should at least be able to send a single message without being asked to cough up more. (One message in fast mode costs a few dollars each) One would think $200/mo would give me any measure of ability to use their more expensive capabilities but it seems it's bucketed to only the capabilities that are offered to even free users.

[−] bob1029 64d ago
I've been avoiding context beyond 100k tokens in general. The performance is simply terrible. There's no training data for a megabyte of your very particular context.

If you are really interested in deep NIAH tasks, external symbolic recursion and self-similar prompts+tools are a much bigger unlock than more context window. Recursion and (most) tools tend to be fairly deterministic processes.

I generally prohibit tool calling in the first stack frame of complex agents in order to preserve context window for the overall task and human interaction. Most of the nasty token consumption happens in brief, nested conversations that pass summaries back up the call stack.

[−] tuo-lei 51d ago
My main frustration with long-context coding sessions isn't just the limit itself, it's that after the fact it's hard to tell which turns actually caused the context to bloat or the session to go off track. It's painful enough I have to build a tool to help myself understand the context/turn data correlation. I have to manual compact now
[−] drcongo 63d ago
Could be pure coincidence, but my Claude Code session last night was an absolute nightmare. It kept forgetting things it had done earlier in the session and why it had done them, messed up a git merge so badly that it lost the CLAUDE.md file along with a lot of other stuff, and then started running commands on the host machine instead of inside the container because it no longer had a CLAUDE.md to tell it not to. Last night was the first time I've ever sworn at it.
[−] margorczynski 64d ago
What about response coherence with longer context? Usually in other models with such big windows I see the quality to rapidly drop as it gets past a certain point.
[−] k__ 63d ago
I heard, the middle of the context is often ignored.

Do long context windows make much sense then or is this just a way of getting people to use more tokens?

[−] PeterStuer 63d ago
The thing that would get me more excited is how far they could push context coherence before the model loses track. I'm hoping 250k.
[−] sporkland 63d ago
Can someone help me with insights about large context models? Are there relationships that pop up at the beginning and end of long context windows that don't transitively follow from intermediate points? Is there value in the training over these longer windows vs using the more basic/closer weight distributions over different sliding windows?
[−] jwilliams 63d ago
I'm fairly sure that your best throughput is single-prompt single-shot runs with Claude (and that means no plan, no swarms, etc) -- just with a high degree of work in parallel.

So for me this is a pretty huge change as the ceiling on a single prompt just jumped considerably. I'm replaying some of my less effective prompts today to see the impact.

[−] chaboud 64d ago
Awesome.... With Sonnet 4.5, I had Cline soft trigger compaction at 400k (it wandered off into the weeds at 500k). But the stability of the 4.6 models is notable. I still think it pays to structure systems to be comprehensible in smaller contexts (smaller files, concise plans), but this is great.

(And, yeah, I'm all Claude Code these days...)

[−] LarsDu88 64d ago
The stuff I built with Opus 4.6 in the past 2.5 weeks:

Full clone of Panel de Pon/Tetris attack with full P2P rollback online multiplayer: https://panel-panic.com

An emulator of the MOS 6502 CPU with visual display of the voltage going into the DIP package of the physical CPU: https://larsdu.github.io/Dippy6502/

I'm impressed as fuck, but a part of me deep down knows that I know fuck all about the 6502 or its assembly language and architecture, and now I'll probably never be motivated to do this project in a way that I would've learned all the tings I wanted to learn.

[−] cubefox 63d ago

> Standard pricing now applies across the full 1M window for both models, with no long-context premium.

Does that mean it's likely not a Transformer with quadratic attention, but some other kind of architecture, with linear time complexity in sequence length? That would be pretty interesting.

[−] holoduke 63d ago
I am currently mass translating millions of records with short descriptions. Somehow tokens are consumed extremely fast. I have 3 max memberships. And all 3 of them are hitting the 5 hour limit in about 5 to 10 minutes. Still don't understand why this is happening.