We just had a realization during a demo call the other day:
The companies that are entirely AI-dependent may need to raise prices dramatically as AI prices go up. Not being dependent on LLMs for your fundamental product’s value will be a major advantage, at least in pricing.
Yup. Also regardless of price they need to spend more and more as the project collapses under the inevitable incidental complexity of 30k lines of code a day.
It's similar to how if you know what you're doing you can manage a simple VPS and scale a lot more cost effectively than something like vercel.
In a saturated market margins are everything. You can't necessarily afford to be giving all your margins to anthropic and vercel.
Look no further to be honest; look at older generation programming languages like COBOL and how sought-after good developers for that language are.
But I'm also afraid / certain that LLMs are able to figure out legacy code (as long as enough fits in their context window), so it's tenuous at best.
Also, funny you mentioned HTML / CSS because for a while (...in the 90's / 2000's) it looked like nobody needed to actually learn those because of tools like Dreamweaver / Frontpage.
The issue with COBOL code is that it’s hidden. It’s mostly internal systems so little code available for training. HTML, TypeScript, JavaScript, C, etc, are readily available, billions of code lines.
Well, on the 2nd paragraph, I have no illusion they’ll figure out more as they are being trained. I am more thinking of the custodians (as coders turn into that)
Say you are a good coder now, but you are becoming a custodian, checking the llm work will slowly erode your skills. Maybe if you got a good memory or an amazing skillset it might be some time, but if you don’t use it, you lose it.
How are COBOL developers "sought after"? That's an oft-repeated but woefully incorrect meme.
FAANG new grads make more. If the COBOL devs had upskilled throughout their career they'd be Senior Staff/Principal+ and making 5-10x more than they do today.
> The companies that are entirely AI-dependent may need to raise prices dramatically as AI prices go up.
It's not that clear. Sure, hardware prices are going up due to the extremely tight supply, but AI models are also improving quickly to the point where a cheap mid-level model today does what the frontier model did a year ago. For the very largest models, I think the latter effect dominates quite easily.
>> The companies that are entirely AI-dependent may need to raise prices dramatically as AI prices go up.
> It's not that clear. Sure, hardware prices are going up due to the extremely tight supply, but AI models are also improving quickly to the point where a cheap mid-level model today does what the frontier model did a year ago.
I agree; I got some coding value out of Qwen for $10/m (unlimited tokens); a nice harness (and some tight coding practices) lowers the distance between SOTA and 6mo second-tier models.
If I can get 80% of the way to Anthropic's or OpenAI's SOTA models using 10$/m with unlimited tokens, guess what I am going to do...
There's only so far engineers can optimise the underlying transformer technique, which is and always has been doing all the heavy lifting in the recent ai boom. It's going to take another genius to move this forward. We might see improvements here and there but the magnitudes of the data and vram requirements I don't think will change significantly
State space models are already being combined with transformers to form new hybrid models. The state-space part of the architecture is weaker in retrieving information from context (can't find a needle in the haystack as context gets longer, the details effectively get compressed away as everything has to fit in a fixed size) but computationally it's quite strong, O(N) not O(N^2).
I’ve read and heard from Semi Analysis and other best-in-class analysts that the amount of software optimizations possible up and down the stack is staggering…
How do you explain that capabilities being equal, the cost per token is going down dramatically?
Optimizations, like I said. They'll never hack away the massive memory requirements however, or the pre training... Imagine the memory requirements without the pre training step....this is just part and parcel of the transformer architecture.
And a lot of these improvements are really just classic automation or chaining together yet more transformer architectures, to fix issues the transformer architecture creates in the first place (hallucinations, limited context)
Exactly this. To actually visualize the sheer scale of the VRAM wall we are hitting, I recently built an LLM VRAM estimator (bytecalculators.com/llm-vram-calculator).
If you play around with the math, you quickly realize that even if we heavily quantize models down to INT4 to save memory, simply scaling the context window (which everyone wants now) immediately eats back whatever VRAM we just saved. The underlying math is extremely unforgiving without fundamentally changing the architecture.
You also have to look at how exposed your vendors are to cost increases as well.
Your company may have the resources to effectively shift to cheaper models without service degradation, but your AI tooling vendors might not. If you pay for 5 different AI-driven tools, that's 5 different ways your upstream costs may increase that you'll need to pass on to customers as well.
Inference prices droped like 90 percent in that time (a combination of cheaper models, implicit caching, service levels, different providers and other optimizations).
Quality went up. Quantity of results went up. Speed went up.
Service level that we provide to our clients went up massively and justfied better deals. Headcount went down.
The decline of independent thoughts for one. As people become reliant on LLMs to do their thinking for them and solve all problems that they stumble upon, they become a shell of their previous self.
There is no decline. Human assets were always too expensive to process some additional information. We are simply processing lot more of low signal data.
Actually some of our analysts are empowered by the tools at their disposal. Their jobs are safe and necessary. Others were let go.
Clients are happy to get fuller picture of their universe, which drives more informed decissions . Everybody wins.
> Not being dependent on LLMs for your fundamental product’s value
I think more specifically not being dependent on someone else's LLM hardware. IMO having OSS models on dedicated hardware could still be plenty viable for many businesses, granted it'll be some time before future OSS reaches today's SOTA models in performance.
What's weird though is the bifurcation in pricing in the market: aka if your app can function on a non-frontier level AI you can use last years model at a fraction of the cost.
That'll be (part of) the big market correction, but also speaking broadly; as investor money dries up and said investors want to see results, many new businesses or products will realise they're not financially viable.
On a small scale that's a tragedy, but there's plenty of analysts that predict an economic crash and recession because there's trillions invested in this technology.
Indeed, it was clear from the beginning, "AI" companies want to become infrastructure and a critical dependency for businesses, so they can capture the market and charge whatever they want. They will have all the capital and data needed to eventually swallow those businesses too, or more likely sell it to anyone who wants the competitive advantage.
in fact I am betting opposite. frontier models are getting not THAT much better anymore at all, for common business needs at least. but the OSS models keep closing the gap. which means if trajectories hold there will be a near future moment probably where the big provider costs suddenly drop shaerply once the first viable local models consistently can take over tasks normally on reasonable hardware. Right now probably frontier providers rush for as much money as they possible can before LLMs become a true commodity for the 80% usecases outside of deep expert areas they will have an edge over as specialist juggernauts (iE a cybersecurity premium model).
So its all a house of cards now, and the moment the bubble bursts is when local open inference has closed the gap. looks like chinese and smaller players already go hard into this direction.
Local open inference can address hardware scarcity by repurposing the existing hardware that users need anyway for their other purposes. But since that hardware is a lot weaker than a proper datacenter setup, it will mostly be useful for running non-time-critical inference as a batch task.
Many users will also seek to go local as insurance against rug pulls from the proprietary models side (We're not quite sure if the third-party inference market will grow enough to provide robust competition), but ultimately if you want to make good utilization of your hardware as a single user you'll also be pushed towards mostly running long batch tasks, not realtime chat (except tiny models) or human-assisted coding.
Absolutely. Pricing exposure is the quiet story under all the waves of AI hype. Build for convenience → subsidise for dependence → meter for margin is a well-worn playbook, and AI-dependent companies are about to find out what phase three feels like.
Hyperscalers are spending a fortune so we think AI = API, but renting intelligence is a business model, not a technical inevitability.
How is that surprising? We've been taking that into account for any LLM related tooling for over a year now that we either can drop it, or have it designed in a way that we can switch to a selfhosted model when throwing money at hardware would pay for itself quickly.
It's just another instance of cloud dependency, and people should've learned something from that over the last two decades.
Not so much that it was surprising, rather that we looked at a competitor’s site and noticed that a) their prices went way up and b) their branding changed to be heavily AI-first.
So we thought, hmm, “wonder if they are increasing prices to deal with AI costs,” and then projected that into a future where costs go up.
We don’t have this dependence ourselves, so this seems to be a competitive advantage for us on pricing.
I wonder if it could be that they won't because the real mechanism is that AI wrapper pricing power is weak (switching costs near zero) but state of the art models makes it difficult to lower prices due to higher cost.
Also: AI dependance could be explicit AI API usage by the product itself, but also anything else, like: AI assisted coding, AI used by humans in other surrounding workflows, etc.
And I don't really mean new businesses that are entirely built around LLMs, rather existing ones that pivoted to be LLM-dependent – yet still have non-LLM-dependent competitors.
same as Uber… in the beginning everyone pretty much new that the cost of rides cannot possibly be that cheap and that it is subsudized. once you corner the market etc people just got used to “real” prices to the poibt that now there are often cheaper alternatives than Uber but people still Uber…
Its also quite interesting to read about Uber exploits their drivers and discriminating algorithms. Cory Doctorow mentioned it in his latest book, sadly cant link the direct sources.
Not really, the next move is to establish standards groups requiring the use of AI in product development. A mix of industry and governmental mandates. What you view are viewing as COGS instead becomes instead a barrier to entry.
China already operates like this. Low cost specialized models are the name of the game. Cheaper to train, easy to deploy.
The US has a problem of too much money leading to wasteful spending.
If we go back to the 80s/90s, remember OS/2 vs Windows. OS/2 had more resources, more money behind it, more developers, and they built a bigger system that took more resources to run.
Mac vs Lisa. Mac team had constraints, Lisa team didn't.
Though I do agree with you, I just came back from a trip to China (Shanghai more specifically) and while attending a couple AI events, the overwhelming majority of people there were using VPNs to access Claude code and codex :-/
On the Mac vs Lisa team, I generally agree but wasn't there a strong tension on budget vs revenue on Mac vs Apple II? And that Apple II had even more constrained budget per machine sold which led to the conflict between Mac and Apple II teams. (Apple II team: "We bring in all the revenue+profit, we offer color monitors, we serve businesses and schools at scale. Meanwhile, Steve's Mac pirate ship is a money pit that also mocks us as the boring Navy establishment when we are all one company!")
By the logic of constraints (on a unit basis), Apple II should have continued to dominate Mac sales through the early 90s but the opposite happened.
It has been a very bad bet that hardware will not evolve to exceed the performance requirements of today's software tomorrow, just as it is a bad bet that tomorrow someone will rewrite today's software to be slower.
Eh, but then as hardware evolves, the software will also follow suit. We’ve had an explosion of compute performance and yet software is crawling for the same tasks we did a decade ago.
Better hardware ensures that software that is “finished” today will run at acceptable levels of performance in the future, and nothing more.
I think we won’t see software performance improve until real constraints are put on the teams writing it and leaders who prioritize performance as a North Star for their product roadmap. Good luck selling that to VCs though.
You can fine-tune a model, but there are also smaller models fine-tuned for specific work like structured output and tool calling. You can build automated workflows that are largely deterministic and only slot in these models where you specifically need an LLM to do a bit of inference. If frontier models are a sledgehammer, this approach is the scalpel.
A common example would be that people are moving tasks from their OpenClaw setup off of expensive Anthropic APIs onto cheaper models for simple tasks like tagging emails, summarizing articles, etc.
Combined with memory systems, internal APIs, or just good documentation, a lot of tasks don't actually require much compute.
Harness is a big one, Claude Code still has trouble editing files with tabs. I wonder how many tokens per day are wasted on Claude attempting multiple times to edit a file.
As a recent example in AI space itself. China had scarce GPU resources, quite obvious why => DeepSeek training team had to invent some wheels and jump through some hoops => some of those methods have since become 'industry standard' and adopted by western labs who are now jumping through the same hoops despite enjoying massive computeresources, for the sake of added efficiency.
I'm having an hard time getting my mind to see this.
> Users should re-tune their prompts and harnesses accordingly.
I read this in the press release and my mind thought it meant test harness. Then there was a blog post about long running harnesses with a section about testing which lead me to a little more confusion.
Yes, the word 'harness' is consistently used in the context as a wrapper around the LLM model not as 'test harness'.
This field is chock full of people using terms incorrectly, defining new words for things that already had well known names, overloading terms already in use. E.g. shard vs partition. TUI which already meant "telephony user interface ". "Client" to mean "server" in blockchain.
Some people also call evaluations "tests". There are unexpected things that come along with new models, like the model in a workflow you'd set up suddenly starts calling a tool and never stops or decides to no longer call a particular tool, so running your existing evaluations to catch regressions like this and potentially updating the prompts is considered "testing" your prompts and harnesses.
It’s the tool that calls the model, give it access to the local file system, calls the actual tools and commands for the model, etc, and provide the initial system prompt.
Basically a clever wrapper around the Anthropic / OpenAI / whatever provider api or local inference calls.
pi vs. claude code vs. codex
These are all agent harnesses which run a model (in pi's case, any model) with a system prompt and their own default set of tools.
Whoever running and selling their own models with inference is invested into the last dime available in the market.
Those valuations are already ridiculously high be it Anthropic or OpenAI to the tune of couple of trillion dollars easily if combind.
All that investment is seeking return. Correct me if I'm wrong.
Developers and software companies are the only serious users because they (mostly) review output of these models out of both culture and necessity.
Anywhere else? Other fields? There these models aren't any useful or as useful while revenue from software companies by no means going to bring returns to the trillion dollar valuations. Correct me if I'm wrong.
To make the matter worst, there's a hole in the bucket in form of open weight models. When squeezed further, software companies would either deploy open weight models or would resort to writing code by hand because that's a very skilled and hardworking tribe they've been doing this all their lives, whole careers are built on that. Correct me if I'm wrong.
Eventually - ROI might not be what VCs expect and constant losses might lead to bankruptcies and all that build out of data centers all of sudden would be looking for someone to rent that compute capacity result of which would be dime a dozen open weight model providers with generous usage tiers to capitalize on that available compute capacity owners of which have gone bankrupt and can't use it any more wanting to liquidate it as much as possible to recoup as much investment as possible.
There is a lot of demand still coming for sure but I think I'm more optimistic. Ready to eat my hat on this but
- higher prices will result in huge demand destruction too. Currently we're burning a lot of tokens just because they're cheap, but a lot of heavy users are going to spend the time moving flows over to Haiku or onprem micro models the moment pricing becomes a topic.
- data centers do not take that long to build, probably there are bottlenecks in weird places like transformers that will cause some hicups, but nvidia's new stuff is waay more efficient and the overall pipeline of stuff coming online is massive.
- probably we will see some more optimization at the harness level still for better caching, better mix of smaller models for some use, etc etc.
These companies have so much money and they at least anthropic and openai are playing winner takes it all stakes, with competition from the smaller players too. I think they're going to be feeding us for free to win favour for quite a while still.
This is probably even the "fun" part of the whole picture. The purely dystopia starts when investment firms just silently grow bigger and bigger data centers like cancer. There will be no press releases, no papers, no chance anyone without billions will even know the details yet alone get access. One day we realise the worlds resources (maybe not as in the paperclip maximiser, but as in memory, energy, GPUs, water, locations) are consumed by trading models and the data centres are already guarded by robot armies. While we were distracted frighting with anthropic and openAI the real war was already over. Mythos is one sign in this direction but i also met a few people who were claiming to fund fairly large research and training operations just by internal models working on financial markets. I have no way to verify those claims but this happened 3 times now and the papers/research they were working on looked pretty solid and did not seem like they were running kimi openclaw on polymarket but actual models on some significant funds. Would be really interested if anyone here has some details on this reality. I would also not be surprised if this is a thing that people in SF just claim to sound dangerous and powerful.
227 comments
The companies that are entirely AI-dependent may need to raise prices dramatically as AI prices go up. Not being dependent on LLMs for your fundamental product’s value will be a major advantage, at least in pricing.
It's similar to how if you know what you're doing you can manage a simple VPS and scale a lot more cost effectively than something like vercel.
In a saturated market margins are everything. You can't necessarily afford to be giving all your margins to anthropic and vercel.
Their might always be llms, but the dependence is an interesting topic.
But I'm also afraid / certain that LLMs are able to figure out legacy code (as long as enough fits in their context window), so it's tenuous at best.
Also, funny you mentioned HTML / CSS because for a while (...in the 90's / 2000's) it looked like nobody needed to actually learn those because of tools like Dreamweaver / Frontpage.
Even the briefest of Google searches show they make around the same as any other enterprise dev if not slightly less.
Say you are a good coder now, but you are becoming a custodian, checking the llm work will slowly erode your skills. Maybe if you got a good memory or an amazing skillset it might be some time, but if you don’t use it, you lose it.
FAANG new grads make more. If the COBOL devs had upskilled throughout their career they'd be Senior Staff/Principal+ and making 5-10x more than they do today.
I think that was about 10 years ago…
> The companies that are entirely AI-dependent may need to raise prices dramatically as AI prices go up.
It's not that clear. Sure, hardware prices are going up due to the extremely tight supply, but AI models are also improving quickly to the point where a cheap mid-level model today does what the frontier model did a year ago. For the very largest models, I think the latter effect dominates quite easily.
>> The companies that are entirely AI-dependent may need to raise prices dramatically as AI prices go up.
> It's not that clear. Sure, hardware prices are going up due to the extremely tight supply, but AI models are also improving quickly to the point where a cheap mid-level model today does what the frontier model did a year ago.
I agree; I got some coding value out of Qwen for $10/m (unlimited tokens); a nice harness (and some tight coding practices) lowers the distance between SOTA and 6mo second-tier models.
If I can get 80% of the way to Anthropic's or OpenAI's SOTA models using 10$/m with unlimited tokens, guess what I am going to do...
How do you explain that capabilities being equal, the cost per token is going down dramatically?
If you play around with the math, you quickly realize that even if we heavily quantize models down to INT4 to save memory, simply scaling the context window (which everyone wants now) immediately eats back whatever VRAM we just saved. The underlying math is extremely unforgiving without fundamentally changing the architecture.
Your company may have the resources to effectively shift to cheaper models without service degradation, but your AI tooling vendors might not. If you pay for 5 different AI-driven tools, that's 5 different ways your upstream costs may increase that you'll need to pass on to customers as well.
Inference prices droped like 90 percent in that time (a combination of cheaper models, implicit caching, service levels, different providers and other optimizations).
Quality went up. Quantity of results went up. Speed went up.
Service level that we provide to our clients went up massively and justfied better deals. Headcount went down.
What's not to like?
Sadly, this is already happening.
Actually some of our analysts are empowered by the tools at their disposal. Their jobs are safe and necessary. Others were let go.
Clients are happy to get fuller picture of their universe, which drives more informed decissions . Everybody wins.
> Not being dependent on LLMs for your fundamental product’s value
I think more specifically not being dependent on someone else's LLM hardware. IMO having OSS models on dedicated hardware could still be plenty viable for many businesses, granted it'll be some time before future OSS reaches today's SOTA models in performance.
On a small scale that's a tragedy, but there's plenty of analysts that predict an economic crash and recession because there's trillions invested in this technology.
This is the “Building my entire livelihood on Facebook, oh no what?” all over again.
Oh no sorry I forgot, your laptops LLM can draw a potato, let me invest in you.
> We just had a realization during a demo call the other day
These tools have been around for years now. As they've improved, dependency on them has grown. How is any organization only just realizing this?
That's like only noticing the rising water level once it starts flooding the second floor of the house.
So its all a house of cards now, and the moment the bubble bursts is when local open inference has closed the gap. looks like chinese and smaller players already go hard into this direction.
Many users will also seek to go local as insurance against rug pulls from the proprietary models side (We're not quite sure if the third-party inference market will grow enough to provide robust competition), but ultimately if you want to make good utilization of your hardware as a single user you'll also be pushed towards mostly running long batch tasks, not realtime chat (except tiny models) or human-assisted coding.
Hyperscalers are spending a fortune so we think AI = API, but renting intelligence is a business model, not a technical inevitability.
Shameless link to my post on this: https://mjeggleton.com/blog/AIs-mainframe-moment
It's just another instance of cloud dependency, and people should've learned something from that over the last two decades.
So we thought, hmm, “wonder if they are increasing prices to deal with AI costs,” and then projected that into a future where costs go up.
We don’t have this dependence ourselves, so this seems to be a competitive advantage for us on pricing.
Or they'll price the true cost in from the start, and make massive profits until the VC subsidies end... I know which one I'd do.
And I don't really mean new businesses that are entirely built around LLMs, rather existing ones that pivoted to be LLM-dependent – yet still have non-LLM-dependent competitors.
* harness design
* small models (both local and not)
I think there is tremendous low hanging fruit in both areas still.
The US has a problem of too much money leading to wasteful spending.
If we go back to the 80s/90s, remember OS/2 vs Windows. OS/2 had more resources, more money behind it, more developers, and they built a bigger system that took more resources to run.
Mac vs Lisa. Mac team had constraints, Lisa team didn't.
Unlimited budgets are dangerous.
By the logic of constraints (on a unit basis), Apple II should have continued to dominate Mac sales through the early 90s but the opposite happened.
Better hardware ensures that software that is “finished” today will run at acceptable levels of performance in the future, and nothing more.
I think we won’t see software performance improve until real constraints are put on the teams writing it and leaders who prioritize performance as a North Star for their product roadmap. Good luck selling that to VCs though.
> Low cost specialized models
Can you elaborate on this? Is this something that companies would train themselves?
A common example would be that people are moving tasks from their OpenClaw setup off of expensive Anthropic APIs onto cheaper models for simple tasks like tagging emails, summarizing articles, etc.
Combined with memory systems, internal APIs, or just good documentation, a lot of tasks don't actually require much compute.
As a recent example in AI space itself. China had scarce GPU resources, quite obvious why => DeepSeek training team had to invent some wheels and jump through some hoops => some of those methods have since become 'industry standard' and adopted by western labs who are now jumping through the same hoops despite enjoying massive computeresources, for the sake of added efficiency.
> Users should re-tune their prompts and harnesses accordingly.
I read this in the press release and my mind thought it meant test harness. Then there was a blog post about long running harnesses with a section about testing which lead me to a little more confusion.
Yes, the word 'harness' is consistently used in the context as a wrapper around the LLM model not as 'test harness'.
Basically a clever wrapper around the Anthropic / OpenAI / whatever provider api or local inference calls.
Whoever running and selling their own models with inference is invested into the last dime available in the market.
Those valuations are already ridiculously high be it Anthropic or OpenAI to the tune of couple of trillion dollars easily if combind.
All that investment is seeking return. Correct me if I'm wrong.
Developers and software companies are the only serious users because they (mostly) review output of these models out of both culture and necessity.
Anywhere else? Other fields? There these models aren't any useful or as useful while revenue from software companies by no means going to bring returns to the trillion dollar valuations. Correct me if I'm wrong.
To make the matter worst, there's a hole in the bucket in form of open weight models. When squeezed further, software companies would either deploy open weight models or would resort to writing code by hand because that's a very skilled and hardworking tribe they've been doing this all their lives, whole careers are built on that. Correct me if I'm wrong.
Eventually - ROI might not be what VCs expect and constant losses might lead to bankruptcies and all that build out of data centers all of sudden would be looking for someone to rent that compute capacity result of which would be dime a dozen open weight model providers with generous usage tiers to capitalize on that available compute capacity owners of which have gone bankrupt and can't use it any more wanting to liquidate it as much as possible to recoup as much investment as possible.
EDIT: Typos
- higher prices will result in huge demand destruction too. Currently we're burning a lot of tokens just because they're cheap, but a lot of heavy users are going to spend the time moving flows over to Haiku or onprem micro models the moment pricing becomes a topic.
- data centers do not take that long to build, probably there are bottlenecks in weird places like transformers that will cause some hicups, but nvidia's new stuff is waay more efficient and the overall pipeline of stuff coming online is massive.
- probably we will see some more optimization at the harness level still for better caching, better mix of smaller models for some use, etc etc.
These companies have so much money and they at least anthropic and openai are playing winner takes it all stakes, with competition from the smaller players too. I think they're going to be feeding us for free to win favour for quite a while still.
Let's see though.