Interesting, I’ve never needed 1M, or even 250k+ context. I’m usually under 100k per request.
About 80% of my code is AI-generated, with a controlled workflow using dev-chat.md and spec.md. I use Flash for code maps and auto-context, and GPT-4.5 or Opus for coding, all via API with a custom tool.
Gemini Pro and Flash have had 1M context for a long time, but even though I use Flash 3 a lot, and it’s awesome, I’ve never needed more than 200k.
For production coding, I use
- a code map strategy on a big repo. Per file: summary, when_to_use, public_types, public_functions. This is done per file and saved until the file changes. With a concurrency of 32, I can usually code-map a huge repo in minutes. (Typically Flash, cheap, fast, and with very good results)
- Then, auto context, but based on code lensing. Meaning auto context takes some globs that narrow the visibility of what the AI can see, and it uses the code map intersection to ask the AI for the proper files to put in context. (Typically Flash, cheap, relatively fast, and very good)
- Then, use a bigger model, GPT 5.4 or Opus 4.6, to do the work. At this point, context is typically between 30k and 80k max.
What I’ve found is that this process is surprisingly effective at getting a high-quality response in one shot. It keeps everything focused on what’s needed for the job.
Higher precision on the input typically leads to higher precision on the output. That’s still true with AI.
For context, 75% of my code is Rust, and the other 25% is TS/CSS for web UI.
Anyway, it’s always interesting to learn about different approaches. I’d love to understand the use case where 1M context is really useful.
Yeah this is the simpler and also effective strategy. A lot of people are building sophisticated AST RAG models. But you really just need to ask Claude to generally build a semantic index for each large-ish piece of code and re-use it when getting context.
You have to make sure the semantic summary takes up significantly less tokens than just reading the code or its just a waste of token/time.
Then have a skill that uses git version logs to perform lazy summary cache when needed.
It seems like a very good use of LLMs. You should write a blog post with detail of your process with examples for people who are not into all AI tools as much. I only use Web UI. Lots of what you are saying is beyond me, but it does sound like clever strategy.
Yeah we all converge to the same workflow, in my ai coding agent I'm working on now, I've added an "index" tool that uses tree-sitter to compress and show the AI a skeleton of a code file.
I'm curious, what does your workflow look like? I saw a plan prompt there, but no specs. When you want to change something, implement a new feature etc, do you just prompt requirements, have it write the plan and then have it work on it?
Fair point, but because I spent a year building and refining my custom tool, this is now the reality for all of my AI requests.
I prompt, press run, and then I get this flow:
dev setup (dev-chat or plan)
code-map (incremental 0s 2m for initial)
auto-context (~20s to 40s)
final AI query (~30s to 2m)
For example, just now, in my Rust code (about 60k LOC), I wanted to change the data model and brainstorm with the AI to find the right design, and here is the auto-context it gave me:
- Reducing 381 context files ( 1.62 MB)
- Now 5 context files ( 27.90 KB)
- Reducing 11 knowledge files ( 30.16 KB)
- Now 3 knowledge files ( 5.62 KB)
The knowledge files are my "rust10x" best practices, and the context files are the source files.
How do you re-evaluate your approach? I'm asking because the landscape, at least from my lens, was completely different a year ago. So I fear that as the foundation shifts whatever learnings, approaches and mental models I have risk being obsolete and starts to work against me.
The problem of evaluating is hard enough as it is without layers of indirection built on top of it.
I built myself an AST based solution for that during the last 6 months roughly. I always wondered whether grep and agent-based discovery will be the end of it and thought it just has to be better with a more deterministic approach.
In the end it's hard to measure but personally I feel that my agent rarely misses any context for a given task, so I'm pretty happy with it.
I used a different approach than tree-sitter because I thought I found a nice way to get around having to write language-specific code. I basically use VSCode as a language backend and wrote some logic to basically rebuild the AST tree from VSCode's symbol data and other API.
That allows me to just install the correct language extension and thus enable support for that specific language. The extension has to provide symbol information which most do through LSP.
In the end it was way more effort than just using tree-sitter, however, and I'm thinking of doing a slow migration to that approach sooner or later.
Anyways, I created an extension that spins up an mcp server and provides several tools that basically replace the vanilla discovery tools in my workflow.
The approach is similar to yours, I have an overview tool which runs different centrality ranking metrics over the whole codebase to get the most important symbols and presents that as an architectural overview to the LLM.
Then I have a "get-symbol-context" tool which allows the AI to get all the information that the AST holds about a single symbol, including a parameter to include source code which completely replaces grepping and file reading for me.
The tool also specifies which other symbols call the one in question and which others it calls, respectively.
But yeah, sorry for this being already a quite long comment, if you want to give it a try, I published it on the VSCode marketplace a couple of days ago, and it's basically free right now, although I have to admit that I still want to try to earn a little bit of money with it at some point.
Right now, the daily usage limit is 2000 tool calls per day, which should be enough for anybody.
I looked at your solution and extension README, and it's very interesting and well thought out.
The fact that you've been using it for six months and that it performs well says a lot. At the end of the day, that's what counts.
I like your idea of piggybacking on top of the LSP services, and I can imagine that this was quite a bit of work. Doing it as an MCP server makes it usable across different tools.
I also really like the name "Context Master."
In my case, it's much more niche since it's for the tool I built. Though it's open source, the key difference is that the "indexing" is only agentic at this point.
I can see value in mixing the two. LSP integration scares me because of the amount of work involved, and tree-sitter seems like a good path.
In that case, in the code map, for each item, there could be both the LLM response info and some deterministic info, for example, from tree-sitter.
That being said, the current approach works so well that I think I am going to keep using and fine-tuning it for a while, and bring in deterministic context only when or if I need it.
Anyway, what you built looks great. If it works, that's great.
Thanks for taking the time to check it out and for the kind words! I really appreciate it.
I totally get sticking with your current approach. Your workflow sounds very intriguing as well. A combination of both approaches might really be very interesting :) Adding an LLM interpretation layer on top of my graph is also something I'm actively considering.
Thanks for the great discussion, and best of luck with your tool and workflow!
This is really interesting; ive done very high level code maps but the entire project seems wild, it works?
So, small model figures out which files to use based on the code map, and then enriches with snippets, so big model ideally gets preloaded with relevant context / snippets up front?
So, I have a pro@coder/.cache/code-map/context-code-map.json.
I also have a .tmpl-code-map.jsonl in the same folder so all of my tasks can add to it, and then it gets merged into context-code-map.json.
I keep mtime, but I also compute a blake3 hash, so if mtime does not match, but it is just a "git restore," I do not redo the code map for that file. So it is very incremental.
Then the trick is, when sending the code map to AI, I serialize it in a nice, simple markdown format.
- path/to/file.rs
- summary: ...
- when to use: ...
- public types: .., .., ..
- public functions: .., .., ..
- ...
So the AI does not have to interpret JSON, just clean, structured markdown.
Funny, I worked on this addition to my tool for a week, planning everything, but even today, I am surprised by how well it works.
I have zero sed/grep in my workflow. Just this.
My prompt is pro@coder/coder-prompt.md, the first part is YAML for the globs, and the second part is my prompt.
There is a TUI, but all input and output are files, and the TUI is just there to run it and see the status.
I think you've kind of hit on the more successful point here, which is that you should be keeping things focused in a sufficiently focused area to have better success and not necessarily needing more context.
> - a code map strategy on a big repo. Per file: summary, when_to_use, public_types, public_functions. This is done per file and saved until the file changes. With a concurrency of 32, I can usually code-map a huge repo in minutes. (Typically Flash, cheap, fast, and with very good results)
Thanks, but why use any AI to generate this? I would say: you document your functions-in-code, types are provided from the compiler service, so it should all be deterministically available in seconds iso minutes, without burning tokens. Am I missing something?
- Using a tree-sitter/AST-like approach, I could extract types, functions, and perhaps comments, and put them into an index map.
- Cons:
- The tricky part of this approach is that what I extract can be pretty large per file, for example, comments.
- Then, I would probably need an agentic synthesis step for those comments anyway.
2) Agentic
- Since Flash is dirt cheap, I wanted to experiment and skip #1, and go directly to #2.
- Because my tool is built for concurrency, when set to 32, it's super fast.
- The price is relatively low, perhaps $1 or $2 for 50k LOC, and 60 to 90 seconds, about 30 to 45 minutes of AI work.
- What I get back is relatively consistent by file, size-wise, and it's just one trip per file.
So, this is why I started with #2.
And then, the results in real coding scenarios have been astonishing.
Way above what I expected.
The way those indexes get combined with the user prompt gets the right files 95% of the time, and with surprisingly high quality.
So, I might add deterministic aspects to it, but since I think I will need the agentic step anyway, I have deprioritized it.
Well, out of all the workflows I have seen, this one is rather nice, might give it a try.
I imagine if the context were being commited and kept up-to-date with CI would work for others to use as well.
However, I'm a little confused on the autocontext/globs narrowing part. Do you, the developer, provide them? Or you feed the full code map to flash + your prompt so it returns the globs based on your prompt?
Also, in general, is your map of a file relatively smaller than the file itself, even for very small files?
Your code map compresses signal on the context side. Same principle applies on the prompt side: prompts that front-load specifics (file, error, expected behavior) resolve in 1-2 turns. Vague ones spiral into 5-6. 1M context doesn't change that — it just gives you more room for the spiral.
This is interesting but don't you worry that you're competing with entire companies (e.g. Anthropic) and thus it's a losing battle? Since you're re-implementing a bunch of stuff they either do in their harness or have decided it was better not to do?
This is fascinating. I feel like this is converging into the concept of a traditional "IDE". So much of your setup reminds me of IDEs indexing, doing static analysis, building ASTs, etc. before a developer starts writing code.
> Standard pricing now applies across the full 1M window for both models, with no long-context premium. Media limits expand to 600 images or PDF pages.
For Claude Code users this is huge - assuming coherence remains strong past 200k tok.
It’s interesting because my career went from doing higher level language (Python) to lower language (C++ and C). Opus and the like is amazing at Python, honestly sometimes better than me but it does do some really stupid architectural decisions occasionally. But when it comes to embedded stuff, it’s still like a junior engineer. Unsure if that will ever change but I wonder if it’s just the quality and availability of training data. This is why I find it hard to believe LLMs will replace hardware engineers anytime soon (I was a MechE for a decade).
Is there a writeup anywhere on what this means for effective context? I think that many of us have found that even when the context window was 100k tokens the actual usable window was smaller than that. As you got closer to 100k performance degraded substantially. I'm assuming that is still true but what does the curve look like?
Claude Code 2.1.75 now no longer delineates between base Opus and 1M Opus: it's the same model. Oddly, I have Pro where the change supposedly only for Max+ but am still seeing this to be case.
EDIT: Don't think Pro has access to it, a typical prompt just hit the context limit.
The removal of extra pricing beyond 200k tokens may be Anthropic's salvo in the agent wars against GPT 5.4's 1M window and extra pricing for that.
Opus 4.6 is nuts. Everything I throw at it works. Frontend, backend, algorithms—it does not matter.
I start with a PRD, ask for a step-by-step plan, and just execute on each step at a time. Sometimes ideas are dumb, but checking and guiding step by step helps it ship working things in hours.
It was also the first AI I felt, "Damn, this thing is smarter than me."
The other crazy thing is that with today's tech, these things can be made to work at 1k tokens/sec with multiple agents working at the same time, each at that speed.
This is super exciting. I've been poking at it today, and it definitely changes my workflow -- I feel like a full three or four hour parallel coding session with subagents is now generally fitting into a single master session.
The stats claim Opus at 1M is about like 5.4 at 256k -- these needle long context tests don't always go with quality reasoning ability sadly -- but this is still a significant improvement, and I haven't seen dramatic falloff in my tests, unlike q4 '25 models.
p.s. what's up with sonnet 4.5 getting comparatively better as context got longer?
This is incredible. I just blew through $200 last night in a few hours on 1M context. This is like the best news I've heard all year in regards to my business.
What is OpenAIs response to this? Do they even have 1M context window or is it still opaque and "depends on the time of day"
I'm very happy about this change. For long sessions with Claude it was always like a punch to the gut when a compaction came along. Codex/GPT-5.4 is better with compactions so I switched to that to avoid the pain of the model suddenly forgetting key aspects of the work and making the same dumb errors all over again. I'm excited to return to Claude as my daily driver!
This is amazing. I have to test it with my reverse engineering workflow. I don't know how many people use CC for RE but it is really good at it.
Also it is really good for writing SketchUp plugins in ruby. It one shots plugins that are in some versions better then commercial one you can buy online.
CC will change development landscape so much in next year. It is exciting and terrifying in same time.
Do long sessions also burn through token budgets much faster?
If the chat client is resending the whole conversation each turn, then once you're deep into a session every request already includes tens of thousands of tokens of prior context. So a message at 70k tokens into a conversation is much "heavier" than one at 2k (at least in terms of input tokens). Yes?
I've been using Claude Code directly on my production servers to debug complex I/O bottlenecks and database locks. The ability of the latest models to hold the entire project context while suggesting real-time fixes is a game changer for solo founders. It helped me stabilize a security tool I’m building when other agents kept hallucinating.
All while their usage limits are so excessively shitty that I paid them 50$ just two days back cause I ran out of usage and they still blocked from using it during a critical work week (and did not refund my 50$ despite my emails and requests and route me to s*ty AI bot.). Anyway, I am using Copilot and OpenCode a lot more these days which is much better.
Compared to yesterday my Claude Max subscription burns usage like absolutely crazy (13% of weekly usage from fresh reset today with just a handful prompts on two new C++ projects, no deps) and has become unbearably slow (as in 1hr for a prompt response). GGWP Anthropic, it was great while it lasted but this isn't worth the hundreds of dollars.
Finally, I don't have to constantly reload my Extra Usage balance when I already pay $200/mo for their most expensive plan. I can't believe they even did that. I couldn't use 1M context at all because I already pay $200/mo and it was going to ask me for even more.
Next step should be to allow fast mode to draw from the $200/mo usage balance. Again, I pay $200/mo, I should at least be able to send a single message without being asked to cough up more. (One message in fast mode costs a few dollars each) One would think $200/mo would give me any measure of ability to use their more expensive capabilities but it seems it's bucketed to only the capabilities that are offered to even free users.
I've been avoiding context beyond 100k tokens in general. The performance is simply terrible. There's no training data for a megabyte of your very particular context.
If you are really interested in deep NIAH tasks, external symbolic recursion and self-similar prompts+tools are a much bigger unlock than more context window. Recursion and (most) tools tend to be fairly deterministic processes.
I generally prohibit tool calling in the first stack frame of complex agents in order to preserve context window for the overall task and human interaction. Most of the nasty token consumption happens in brief, nested conversations that pass summaries back up the call stack.
My main frustration with long-context coding sessions isn't just the limit itself, it's that after the fact it's hard to tell which turns
actually caused the context to bloat or the session to go off track.
It's painful enough I have to build a tool to help myself understand the context/turn data correlation. I have to manual compact now
Could be pure coincidence, but my Claude Code session last night was an absolute nightmare. It kept forgetting things it had done earlier in the session and why it had done them, messed up a git merge so badly that it lost the CLAUDE.md file along with a lot of other stuff, and then started running commands on the host machine instead of inside the container because it no longer had a CLAUDE.md to tell it not to. Last night was the first time I've ever sworn at it.
What about response coherence with longer context? Usually in other models with such big windows I see the quality to rapidly drop as it gets past a certain point.
Can someone help me with insights about large context models? Are there relationships that pop up at the beginning and end of long context windows that don't transitively follow from intermediate points? Is there value in the training over these longer windows vs using the more basic/closer weight distributions over different sliding windows?
I'm fairly sure that your best throughput is single-prompt single-shot runs with Claude (and that means no plan, no swarms, etc) -- just with a high degree of work in parallel.
So for me this is a pretty huge change as the ceiling on a single prompt just jumped considerably. I'm replaying some of my less effective prompts today to see the impact.
Awesome.... With Sonnet 4.5, I had Cline soft trigger compaction at 400k (it wandered off into the weeds at 500k). But the stability of the 4.6 models is notable. I still think it pays to structure systems to be comprehensible in smaller contexts (smaller files, concise plans), but this is great.
The stuff I built with Opus 4.6 in the past 2.5 weeks:
Full clone of Panel de Pon/Tetris attack with full P2P rollback online multiplayer:
https://panel-panic.com
An emulator of the MOS 6502 CPU with visual display of the voltage going into the DIP package of the physical CPU:
https://larsdu.github.io/Dippy6502/
I'm impressed as fuck, but a part of me deep down knows that I know fuck all about the 6502 or its assembly language and architecture, and now I'll probably never be motivated to do this project in a way that I would've learned all the tings I wanted to learn.
> Standard pricing now applies across the full 1M window for both models, with no long-context premium.
Does that mean it's likely not a Transformer with quadratic attention, but some other kind of architecture, with linear time complexity in sequence length? That would be pretty interesting.
I am currently mass translating millions of records with short descriptions. Somehow tokens are consumed extremely fast. I have 3 max memberships. And all 3 of them are hitting the 5 hour limit in about 5 to 10 minutes. Still don't understand why this is happening.
519 comments
About 80% of my code is AI-generated, with a controlled workflow using dev-chat.md and spec.md. I use Flash for code maps and auto-context, and GPT-4.5 or Opus for coding, all via API with a custom tool.
Gemini Pro and Flash have had 1M context for a long time, but even though I use Flash 3 a lot, and it’s awesome, I’ve never needed more than 200k.
For production coding, I use
- a code map strategy on a big repo. Per file: summary, when_to_use, public_types, public_functions. This is done per file and saved until the file changes. With a concurrency of 32, I can usually code-map a huge repo in minutes. (Typically Flash, cheap, fast, and with very good results)
- Then, auto context, but based on code lensing. Meaning auto context takes some globs that narrow the visibility of what the AI can see, and it uses the code map intersection to ask the AI for the proper files to put in context. (Typically Flash, cheap, relatively fast, and very good)
- Then, use a bigger model, GPT 5.4 or Opus 4.6, to do the work. At this point, context is typically between 30k and 80k max.
What I’ve found is that this process is surprisingly effective at getting a high-quality response in one shot. It keeps everything focused on what’s needed for the job.
Higher precision on the input typically leads to higher precision on the output. That’s still true with AI.
For context, 75% of my code is Rust, and the other 25% is TS/CSS for web UI.
Anyway, it’s always interesting to learn about different approaches. I’d love to understand the use case where 1M context is really useful.
You have to make sure the semantic summary takes up significantly less tokens than just reading the code or its just a waste of token/time.
Then have a skill that uses git version logs to perform lazy summary cache when needed.
Here's the implementation for the interested: https://github.com/tontinton/maki/blob/main/maki-code-index%...
I've always wanted to explore how to fit tree-sitter into this workflow. It's great to know that this works well too.
Thanks for sharing the code.
(Here is the AIPack runtime I built, MIT: https://github.com/aipack-ai/aipack), and here is the code for pro@coder (https://github.com/aipack-ai/packs-pro/tree/main/pro/coder) (AIPack is in Rust, and AI Packs are in md / lua)
i said well yeah, but its too sophiscated to be practical
I prompt, press run, and then I get this flow: dev setup (dev-chat or plan) code-map (incremental 0s 2m for initial) auto-context (~20s to 40s) final AI query (~30s to 2m)
For example, just now, in my Rust code (about 60k LOC), I wanted to change the data model and brainstorm with the AI to find the right design, and here is the auto-context it gave me:
- Reducing 381 context files ( 1.62 MB)
- Now 5 context files ( 27.90 KB)
- Reducing 11 knowledge files ( 30.16 KB)
- Now 3 knowledge files ( 5.62 KB)
The knowledge files are my "rust10x" best practices, and the context files are the source files.
(edited to fix formatting)
The problem of evaluating is hard enough as it is without layers of indirection built on top of it.
In the end it's hard to measure but personally I feel that my agent rarely misses any context for a given task, so I'm pretty happy with it.
I used a different approach than tree-sitter because I thought I found a nice way to get around having to write language-specific code. I basically use VSCode as a language backend and wrote some logic to basically rebuild the AST tree from VSCode's symbol data and other API.
That allows me to just install the correct language extension and thus enable support for that specific language. The extension has to provide symbol information which most do through LSP.
In the end it was way more effort than just using tree-sitter, however, and I'm thinking of doing a slow migration to that approach sooner or later.
Anyways, I created an extension that spins up an mcp server and provides several tools that basically replace the vanilla discovery tools in my workflow.
The approach is similar to yours, I have an overview tool which runs different centrality ranking metrics over the whole codebase to get the most important symbols and presents that as an architectural overview to the LLM.
Then I have a "get-symbol-context" tool which allows the AI to get all the information that the AST holds about a single symbol, including a parameter to include source code which completely replaces grepping and file reading for me.
The tool also specifies which other symbols call the one in question and which others it calls, respectively.
But yeah, sorry for this being already a quite long comment, if you want to give it a try, I published it on the VSCode marketplace a couple of days ago, and it's basically free right now, although I have to admit that I still want to try to earn a little bit of money with it at some point.
Right now, the daily usage limit is 2000 tool calls per day, which should be enough for anybody.
Would love to hear what you think :)
<https://marketplace.visualstudio.com/items?itemName=LuGoSoft...>
The fact that you've been using it for six months and that it performs well says a lot. At the end of the day, that's what counts.
I like your idea of piggybacking on top of the LSP services, and I can imagine that this was quite a bit of work. Doing it as an MCP server makes it usable across different tools.
I also really like the name "Context Master."
In my case, it's much more niche since it's for the tool I built. Though it's open source, the key difference is that the "indexing" is only agentic at this point.
I can see value in mixing the two. LSP integration scares me because of the amount of work involved, and tree-sitter seems like a good path.
In that case, in the code map, for each item, there could be both the LLM response info and some deterministic info, for example, from tree-sitter.
That being said, the current approach works so well that I think I am going to keep using and fine-tuning it for a while, and bring in deterministic context only when or if I need it.
Anyway, what you built looks great. If it works, that's great.
I totally get sticking with your current approach. Your workflow sounds very intriguing as well. A combination of both approaches might really be very interesting :) Adding an LLM interpretation layer on top of my graph is also something I'm actively considering.
Thanks for the great discussion, and best of luck with your tool and workflow!
So, small model figures out which files to use based on the code map, and then enriches with snippets, so big model ideally gets preloaded with relevant context / snippets up front?
Where does code map live? Is it one big file?
I also have a
.tmpl-code-map.jsonlin the same folder so all of my tasks can add to it, and then it gets merged into context-code-map.json.I keep mtime, but I also compute a blake3 hash, so if mtime does not match, but it is just a "git restore," I do not redo the code map for that file. So it is very incremental.
Then the trick is, when sending the code map to AI, I serialize it in a nice, simple markdown format.
- path/to/file.rs - summary: ... - when to use: ... - public types: .., .., .. - public functions: .., .., ..
- ...
So the AI does not have to interpret JSON, just clean, structured markdown.
Funny, I worked on this addition to my tool for a week, planning everything, but even today, I am surprised by how well it works.
I have zero sed/grep in my workflow. Just this.
My prompt is pro@coder/coder-prompt.md, the first part is YAML for the globs, and the second part is my prompt.
There is a TUI, but all input and output are files, and the TUI is just there to run it and see the status.
1) Deterministic
2) Agentic So, this is why I started with #2.And then, the results in real coding scenarios have been astonishing.
Way above what I expected.
The way those indexes get combined with the user prompt gets the right files 95% of the time, and with surprisingly high quality.
So, I might add deterministic aspects to it, but since I think I will need the agentic step anyway, I have deprioritized it.
I imagine if the context were being commited and kept up-to-date with CI would work for others to use as well.
However, I'm a little confused on the autocontext/globs narrowing part. Do you, the developer, provide them? Or you feed the full code map to flash + your prompt so it returns the globs based on your prompt?
Also, in general, is your map of a file relatively smaller than the file itself, even for very small files?
> Standard pricing now applies across the full 1M window for both models, with no long-context premium. Media limits expand to 600 images or PDF pages.
For Claude Code users this is huge - assuming coherence remains strong past 200k tok.
Normally buying the bigger plan gives some sort of discount.
At Claude, it's just "5 times more usage 5 times more cost, there you go".
EDIT: Don't think Pro has access to it, a typical prompt just hit the context limit.
The removal of extra pricing beyond 200k tokens may be Anthropic's salvo in the agent wars against GPT 5.4's 1M window and extra pricing for that.
I start with a PRD, ask for a step-by-step plan, and just execute on each step at a time. Sometimes ideas are dumb, but checking and guiding step by step helps it ship working things in hours.
It was also the first AI I felt, "Damn, this thing is smarter than me."
The other crazy thing is that with today's tech, these things can be made to work at 1k tokens/sec with multiple agents working at the same time, each at that speed.
The stats claim Opus at 1M is about like 5.4 at 256k -- these needle long context tests don't always go with quality reasoning ability sadly -- but this is still a significant improvement, and I haven't seen dramatic falloff in my tests, unlike q4 '25 models.
p.s. what's up with sonnet 4.5 getting comparatively better as context got longer?
What is OpenAIs response to this? Do they even have 1M context window or is it still opaque and "depends on the time of day"
Also it is really good for writing SketchUp plugins in ruby. It one shots plugins that are in some versions better then commercial one you can buy online.
CC will change development landscape so much in next year. It is exciting and terrifying in same time.
If the chat client is resending the whole conversation each turn, then once you're deep into a session every request already includes tens of thousands of tokens of prior context. So a message at 70k tokens into a conversation is much "heavier" than one at 2k (at least in terms of input tokens). Yes?
As soon as I saw the announcement , tried again and created a working design skill that can create design artifacts following the brand guidelines.
While these improvements seem incremental, they have a compounding effect on usefulness.
My AI doomsday calculator just got decremented by anothet 6 months.
Next step should be to allow fast mode to draw from the $200/mo usage balance. Again, I pay $200/mo, I should at least be able to send a single message without being asked to cough up more. (One message in fast mode costs a few dollars each) One would think $200/mo would give me any measure of ability to use their more expensive capabilities but it seems it's bucketed to only the capabilities that are offered to even free users.
If you are really interested in deep NIAH tasks, external symbolic recursion and self-similar prompts+tools are a much bigger unlock than more context window. Recursion and (most) tools tend to be fairly deterministic processes.
I generally prohibit tool calling in the first stack frame of complex agents in order to preserve context window for the overall task and human interaction. Most of the nasty token consumption happens in brief, nested conversations that pass summaries back up the call stack.
Do long context windows make much sense then or is this just a way of getting people to use more tokens?
So for me this is a pretty huge change as the ceiling on a single prompt just jumped considerably. I'm replaying some of my less effective prompts today to see the impact.
(And, yeah, I'm all Claude Code these days...)
Full clone of Panel de Pon/Tetris attack with full P2P rollback online multiplayer: https://panel-panic.com
An emulator of the MOS 6502 CPU with visual display of the voltage going into the DIP package of the physical CPU: https://larsdu.github.io/Dippy6502/
I'm impressed as fuck, but a part of me deep down knows that I know fuck all about the 6502 or its assembly language and architecture, and now I'll probably never be motivated to do this project in a way that I would've learned all the tings I wanted to learn.
> Standard pricing now applies across the full 1M window for both models, with no long-context premium.
Does that mean it's likely not a Transformer with quadratic attention, but some other kind of architecture, with linear time complexity in sequence length? That would be pretty interesting.