Would be cool to have a $5-10/month plan that only works off-peak, for people who want to do the occasional side project after work. Right now it's hard to justify anything but Copilot (because it's cheaper, offers the same models, and I'm nowhere near the usage limits).
I suspect that any GPU cycle not spent on inference will just be dedicated to training (which as I understand it can “soak up” essentially unlimited compute at constant value per token), and I’d not expect to see time-based billing until that changes.
Isn't this post an announcement of time-based billing? Just in a kind of indirect way (not billing, rather than billing).
Also, my (extremely naive) understanding is that at the cutting edge, hardware is diverging for training vs inference. That might not be true for Anthropic though.
Would be better if they simply made it free for open source developers. I can barely justify spending time on my hobby projects. If I paid for this, I'd be paying to work for them since they're using our data for training.
This use to be the case, but in the last 36 hrs or so, copilot silently kneecapped Claude models and I've been getting rate limited on like every 3rd request. Not only does the call fail mid way, they still charge me for the request.
A $50-per-week Codex Pro/Claude Max plan would be perfect for solo gamedevs/open-source devs who have existing code that would benefit from an occasional review pass or subsystem experiments/brainstorming with the most powerful models, but don't need to use one for a whole month.
I canceled my plan today and wrote my reason as: now that I have a job again I don’t have the time or needs for the pro plan. If there was a $5 a month option, I would gladly take it to make use of Opus for my rare side ideas.
Pricing will soon be structured around energy costs and On/off peak power rates, I’m actually surprised it hasn’t happened sooner.
Even with Behind the Meter Generation, you’re not completely mitigated from peak (daily) power prices. Being able to shift at least some demand around will help from a pure energy costs perspective.
Most of these Behind the Meter generation projects will be Gas Generation. Guess what happens during a cold snap like the one we experienced in the Northeast US a few weeks ago? Natural gas prices jumped 10X in the daily market. You say that they are hedged? Hedges do not matter during Operational Flow Order(OFO)/Force Majeur/Curtailment pipeline events and they are exposed to the daily market. (I do this for a living)
Presumably they have unused compute in those hours and figure they may as well enable people to use it and get more invested into their ecosystem.
What I wish Anthropic would do is be a lot more explicit about what windows apply when. Surely they have the data to say "you get X usage from hours A to B, Y usage from B to C"
I just know there has to be some psychology in play with these promos. The promo during December got me to upgrade to the $100 plan, and I know I'm not the only one.
An anecdote: for a while now I've noticed or imagined Claude Code becoming ever so slightly dumber around 3-4pm CEST, I've been calling it the "Americans are awake" syndrome, because of assumed higher usage while keeping the latency the same (which is something Anthropic surely keeps an eye on) and thus lower quality.
From my understanding:
Peak time (non-promo): UTC 12:00–18:00 / KST (UTC+9): 21:00–03:00
Off-peak time (promo): UTC 18:00–12:00 / KST (UTC+9): 03:00–21:00
I guess I’ll need to do more coding during the daytime.
Long ago in the ancient days of punchcards and IBM mainframes, you’d write your programs during the day, then submit them to run overnight and pick up your results in the morning. It would be funny and sort of romantic if time-based LLM pricing returned us to that: write your specs all day, run agents on them overnight, check out the results in the morning.
"One thing I really suspect we'll see a lot more of is much more generous rate limits at 'off peak' times - likely to be early morning UTC - as there is no doubt a lot of "idle" compute sitting there"
I strongly suspect this will end up in the opposite happening - where peak tokens are far more "expensive" (whether that be thru usage limits of API costs) than off-peak.
PS: Anthropic have managed to improve reliability but are absolutely shredding opus tok/s at peak times. It absolutely crawls on the web (maybe 2-3 tok/s?) and I believe that on non-max plans it's also incredibly slow on claude code.
Interesting to see more demand shaping mechanisms applied to LLM inference. Even though the "batch processing" feature is already available. I guess this "promotion" is to test the hypothesis of sliding along the spectrum towards more "real-time" demand shaping.
I need something in between pro and max (about 2-3x pro not 5x). Really hoping this usage promotion is a permanent fixture. I have Claude through work and more tokens than I know what to do with. But on personal projects, I tend to want a lot of tokens all at once at late hours.
Who are these guys even competing with that they are going so hard with the deals? Like the 1M context window, is Gemini offering that? In any case, they seem to have no real competition today.
I don't really understand why AI providers don't charge like the electric company, or AWS. Instead of increasing usage limits, just charge less for off-hours use.
148 comments
Also, my (extremely naive) understanding is that at the cutting edge, hardware is diverging for training vs inference. That might not be true for Anthropic though.
https://claude.com/contact-sales/claude-for-oss
But them even admitting that was possible is a little bit to close to being able to be held accountable...
Most of these Behind the Meter generation projects will be Gas Generation. Guess what happens during a cold snap like the one we experienced in the Northeast US a few weeks ago? Natural gas prices jumped 10X in the daily market. You say that they are hedged? Hedges do not matter during Operational Flow Order(OFO)/Force Majeur/Curtailment pipeline events and they are exposed to the daily market. (I do this for a living)
What I wish Anthropic would do is be a lot more explicit about what windows apply when. Surely they have the data to say "you get X usage from hours A to B, Y usage from B to C"
From my understanding: Peak time (non-promo): UTC 12:00–18:00 / KST (UTC+9): 21:00–03:00 Off-peak time (promo): UTC 18:00–12:00 / KST (UTC+9): 03:00–21:00
I guess I’ll need to do more coding during the daytime.
So much for that plan.
"One thing I really suspect we'll see a lot more of is much more generous rate limits at 'off peak' times - likely to be early morning UTC - as there is no doubt a lot of "idle" compute sitting there"
I strongly suspect this will end up in the opposite happening - where peak tokens are far more "expensive" (whether that be thru usage limits of API costs) than off-peak.
PS: Anthropic have managed to improve reliability but are absolutely shredding opus tok/s at peak times. It absolutely crawls on the web (maybe 2-3 tok/s?) and I believe that on non-max plans it's also incredibly slow on claude code.
So they could “double” your usage by keeping it the same and then simply halving peak usage.
If they are doing it “right” I think any off peak usage should count 50% toward your weekly limits.
Edit: it does look like they are doing it the "right" way.