> The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves.
Then I'd wager it's the same for the courses and workshop this guy is selling...an LLM can probably give me at least 75% of the financial insights for not even .1% of what this "agile coach" is asking for his workshops and courses.
Maybe the "agile coach LLM" can explain to the "coding LLM's" why they're too expensive, and then the "coding LLM's" can tell the "agile coach LLM" to take the next standby shift then, if he knows so much about code?
And then we actual humans can have a day off and relax at the pool.
Ceding the premise that the AGI is gonna eat my job, my job involves reading the spec to be able verify the code and output so the there’s a human to fire and sue. There are five layers of fluffy management and corporate BS before we get to that part, and the AGI is more competent at those fungible skills.
With the annoying process people out of the picture, even reviewing vibeslop full time sounds kinda nice… Feet up, warm coffee, just me and my agents so I can swear whenever I need to. No meetings, no problems.
Yeah, that paragraph really betrayed the author's ignorance of software development. At the very least, it proves that they have no hands-on experience with LLM-assisted development.
Getting these tools to "understand", or be able to generate good results in a codebase, is not a function of the number of agents or the time you let them run. Much rather, if the tools fail to produce anything useful after a few minutes, you can bet your ass that they're not going to work better after hours, or days. If they come up with a mess, and your reaction is to just let them work on it for a few days, I can confidently predict what you'll end up with.
They come close to grasping what we've learned about where these new tools are useful and where they aren't, only to end up falling for the pretty words these generators use to lipstick their turds. As right as they may be about the financial considerations, there are going to be some very uncomfortable bills to pay for those who share this belief in the magical abilities of LLMs.
All of this article, both the good (critique of the status quo ante) and the bad (entirely too believing of LLM boosterism) are missing (or not stressing enough) the most important point, which is that the actual programming is not the hard part. Figuring out what exactly needs programmed is the hard part.
For reasons which it would take a while to unpack, if is often the case that the best (or sometimes only) way to find out what programming actually needs to be done, is to program something that's not it, and then replace it. This may need to be done multiple times. Programming is only occasionally the final product, it is much more often the means of working through what it is that is actually needed. This is very difficult for the people who ask for the software, to understand, and it is quite often very difficult for the people doing the programming to understand.
Most of what is being done, during programming, is working through the problem space in a way which will make it more obvious what your mistakes are, in your understanding of the problem and what a solution would look like. Once you have arrived at that understanding, then there are a variety of ways to make what you need, but that is not the rate-limiting step.
> A messy codebase is still cheaper to send ten agents through than to staff a team around
People who say that haven't used today's agents enough or haven't looked closely at what they produce. The code they write isn't messy at all. It's more like asking the agent to build a building from floorplans and spec, and it produces everything in the right measurements and right colours and passes all tests. Except then you find out that the walls and beams are made of foam and the art is load-bearing. The entire construction is just wrong, hidden behind a nice exterior. And when you need to add a couple more floors, the agents can't "get through it" and neither can people. The codebase is bricked.
Today's agents are simply not capable enough - without very close and labour-intensive human supervision - to produce code that can last through evolution over any substantial period of time.
> A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today.
I’ve been on 2 failed projects that have been entirely AI generated and it’s not that agents slow down and you can just send more agents to work on projects for longer, it’s that they becoming completely unable to make any progress whatsoever, and whatever progress they do make is wrong.
> Software development is one of the most capital-intensive activities a modern company undertakes
The article is definitely written from a "high tech" industry lens. A mid-sized utility might spend $80-$150 million USD on IT capital projects in a year, but $2b on power pole maintenance. Utilities are a strong example, but any large enterprise manufacturing company is spending more on factory upgrades that programming.
> [...] built a functional replica of approximately 95% of Slack’s core product in fourteen days using LLM agents.
IT and Finance leadership and asset heavy companies are currently trying to wrap their head around the current economics of their 100+ SaaS contracts, and if it still makes sense with LLM powered developers. Can they hire developers in house to build the fraction of the tool they use from many of these companies, save on total cost and Opex?
I work with these companies a lot, and won't weigh in on the right decision. Bottom line "it depends" on many factors, some of which are not immediately obvious. The article still holds weight regardless of industries, but there is some nuance (talent availability, internal change cost, etc.) that also have to be considered.
I thought it was a good article, till I saw the Slack example.
The copy doesn’t even remotely grasp the scale of what the actual Slack sofware does in terms of scale, relaiability, observability, monitorability, maintability and pretty sure also functionality.
Author only writes about the non-dev work as difference, which seems like he doesn’t know what he’s talking about in all, and what running an application at that scale actually means.
This "clone" doesn’t get you any closer to an actualy Slack copy than a white piece of paper
When I see someone just throwing a lot of numbers and graphs at me, I see that there are in to win an argument, and not propose an idea.
Of late, I've come across a lot of ideas from Rory Sutherland and my conclusion from listening to his ideas is that there are some people, who're obsessed with numbers, because to them it's a way to find certainty and win arguments. He calls them "Finance People" (him being a Marketing one). Here's an example
"Finance people don’t really want to make the company money over time. They just thrive on certainty and predictability. They try to make the world resemble their fantasy of perfect certainty, perfect quantification, perfect measurement.
Here’s the problem. A cost is really quantifiable and really visible. And if you cut a cost, it delivers predictable gains almost instantaneously."
> Choosing to spend three weeks on a feature that serves 2% of users is a €60,000 decision.
I'd really want to hire the Oracle of a PM/ Analyst that can give me that 2% accurately even 75% of the time, and promise nothing non-linear can come from an exercise.
I'm not commenting too much on the details of the article, but the premise does resonate with me. I would argue all the engineering teams I've been on do not spend enough time thinking about how much a piece of work will cost to execute, and whether it will generate a return.
I suspect this is most apparent on things like meeting culture. Something happens and all of a sudden there is another recurring meeting on the calendar, with 15 attendee's, costing x dollars in wages, that produces no value for the customers because the lesson was already learned.
Or when reacting to an incident of some sort, it's so easy to have a long list of action items that may theoretically improve the situation, but in reality are incredibly expensive for the value they produce (or the risks they reduce). It's too easy to say, we'll totally redesign the system to avoid said problem. And what worries me, is often those very expansive actions, then cause you to overlook realistic but small investments that move the needle more than you would think.
And as a hot topic I also think the costs are an input into taking on tech debt. I know we all hate tech debt with a passion, but honestly, I think of it as a tool that can be wielded responsibly or irresponsibly. But if we don't know what our attention costs, we're going to have difficulty making the responsible choices about when and where to take on this debt. And then if we're not conscious about the debt, when it comes do it stings so much harder to pay down.
I think the only thing that matters is whether the people on the team care deeply about the product; whether they care more about the product than their own careers (in the short term). Without that, any metric or way of thinking can and will be gamed.
Unfortunately, even with all the management techniques in the world, there are just some projects that are impossible to care about. There’s simply a significantly lower cap on productivity on these projects.
Making it solely about the extraction of dollars is a great recipe to make something mediocre. See Hollywood or Microslop.
Its like min-maxing a Diablo build where you want the quality of the product to be _just_ above the "acceptable" threshold but no higher because that's wasting money. Then, you're free to use all remaining points to spec into revenue.
Look! A guy built 95% of slack in 2 weeks! Very skeptical of that btw, but also an organization that justifies every single team by exactly how much $ value they’re generating sounds like hell. How would you ever innovate or try out new ideas? It’s important to quantify what impact your team is generating but there are some cases (e.g. UX) which are really hard to quantify in $ but are still very important for the product
Still don’t understand what regular people (like the author) gain from selling how wonderful AI is. I get that the folks at Anthropic and openai shove AI through our throats every day, but nobodies?
The over-simplification rubs me the wrong way, for example:
Consider a team of eight engineers whose mission is to build and maintain an internal developer platform serving one hundred other engineers. This is a common organizational structure, and it is one where the financial logic is rarely examined carefully.
The team costs €87,000 per month. To justify that cost, the platform they build needs to generate at least €87,000 per month in value for the engineers who use it. The most direct way to measure that value is through time saved, since the platform’s purpose is to make other engineers more productive.
At a cost of €130,000 per year, one engineer costs approximately €10,800 per month, or around €65 per working hour. For the platform team to break even, their platform needs to save the hundred engineers they serve a combined total of 1,340 hours per month. That is 13.4 hours per engineer per month, or roughly three hours per week per person.
There's a fungibility assumption which is pervasive here. In most cases, a platform team is there not "to save time".
It's there to deal with cross concerns that would be not only time consuming but could be business threatening, and in some cases, you keep there more expensive engineers that ensure that certain critical things are done right.
The "author" used someone's vibecoded Slack clone to justify his conclusions. I think he believes that the majority of Slack's value lies in the slick CSS animations.
I do agree with his thesis in the middle, about how the ZIRP decade and the cultures that were born from that period were outrageous and cannot survive the current era. It's a brave new world, and it's not because of AI. It's because there's just not enough money flowing anymore, and what little is left is sucked up by the big boys (AI).
This is some aggressive consultant fluff. Few companies have such distinctive "profit" measures. If "the financial logic is rarely examined carefully" than maybe there's a reason, since analysis like this is mostly fantastical and brittle. This is the sort of argument that is both rational and implausible. A manager might use this logic to rationalize firing an engineering team (which is mostly why guys like this get hired) but they won't use it to manage an engineering team.
I feel like there is a lot of nuance around this topic that is getting lost in the noise.
The direct and indirect financial impact of technical decisions are indeed hard to measure. But some technical decisions definitely have greater financial impact than others. Even if it's hard to precisely quantify the financial costs/benefits of every decision. It is possible to order them relatively. X is likely to make more money than Y. So we do X first and Y later.
There is a significant amount of chance involved in whether a product/feature will even make money at all. So even good plans with measurably positive expected value could end up losing money.
Just because it's impossible to be 100% certain of the outcome of any decision. Doesn't mean we should throw the baby out with the bathwater.
> There is no cohort of senior product leaders who developed their judgment in conditions where their teams were expected to demonstrate financial return, because those conditions did not exist during the years when that cohort was learning the craft.
There totally is such a cohort. There are plenty of bootstrapped companies or startups that took only an angel round and did not benefit from the low rate environment, in fact they suffered because of the very high price of SWE labor. But those engineering managers exist and are out there right now still building efficiently, quietly growing, passionately serving customers, and keeping a close eye on the bottom line and risks because that’s their livelihood.
> The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves.
I keep seeing this assumption that "unmanageable" caps out at "kinda hard to reason about", and anyone with experience in large codebases can tell you that's not so. There are software components I own today which require me to routinely explain to junior engineers (and indeed to my own instances of Claude) why their PR is unsound and I won't let them merge it no matter how many tests they add.
This article is not bad overall, but it does over-index on the cost of making software development costs and tradeoffs legible. Of course leadership does need to make decisions, and so the quest for better data and better cost modeling will continue, and rightly so, Goodhart's law notwithstanding.
I do like this bit though:
> A large codebase also carries maintenance costs that grow over time as the system becomes more complex, more interconnected, and more difficult to change safely. Every engineer added to maintain it increases coordination costs, introduces new dependencies, and adds to the organizational weight that slows decision-making. The asset and the liability exist simultaneously, and for most of the past twenty years, the financial environment masked the liability side of that equation.
And the insight that LLMs are exposing this reality is absolutely true. The funny thing is they are exposing it by accelerating both good and bad engineering practices. Teams with good engineering judgement will move faster than ever with fewer people, and teams with bad engineering judgment will bury themselves in technical debt so fast the wheels will come off.
For me, running an engineering org is primarily about talent acquisition and empowering those ICs with judgment to move quickly. How well systems and teams scale depends on the domain, product, and how it allows you to decouple things. With the right talent and empowerment there are often creative ways to make product and system tradeoffs and iterate quickly to change the shape of ROI. Any mapping to financial metrics is a hugely lossy operation that can't account for such changes. It might work in mature companies that are ossified and in the second half of their lifecycle, but in growing companies I think it's fundamentally misguided would amount to empowering the wrong people.
The argument against platform teams needs to be balanced with the compounding nature of technical debt.
The argument to always go for the biggest return works OK for the first few years of high growth (though the timeline is probably greatly compressed the more you use AI), but it turns into a kind of quicksand later.
> The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. […] The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves.
Maybe there’s some new paradigm that makes this true. But it doesn’t seem obviously true to me.
Humans make the best code long term when everything orbits a vision of the underlying problem space.
LLMs seem to only consider the deeper problem space when I explicitly flag it for them, otherwise they write “good enough for this situation” type code. And that stack of patches type code is exactly how the code becomes messy and complicated in the first place.
I don't understand the urgency around quantifying every aspect of the software process. Surely, we are in agreement that money in must at least equal money out if the company is to be viable? This is a simple quickbooks report, is it not?
Why don't we instead focus our energies on the customer and then work our way backward into the technology. There are a lot of ways to solve problems these days. But first you want to make sure you are solving the right problem. Whether or not your solution represents a "liability" or an "asset" is irrelevant if the customer doesn't even care about it.
One interesting factor that I rarely see discussed is this: Let's say a DevOps person does some improvement to internal tooling and a task that devs had to oversee manually now is automated. Every dev spent about 2 hours per week doing this task and now they don't have to anymore. Now, have we saved 2 hours of salary per dev per week?
Not sure. Because it totally depends on what they do instead. Are they utilizing two hours more every week now doing meaningful work? Or are they just taking things a bit more easy? Very hard to determine and it just makes it harder to reason about the costs and wins in these cases.
This is a very reductionist way to calculate the value of a software team or any team within an organization. That’s because many times the value delivered by a team is not necessarily monetary but strategic.
The estimate cost number is for very large companies with massive overhead bulk. Dump the management overhead, the HR machine and other things smaller companies do not have and this number comes down massively.
> their value is, at least in principle, calculable
I feel this article does not spend enough time investigating the challenges of directly measuring the financial output of an engineering team. I agree it is theoretically possible, but I don't think the full answer is that people got lazy on cheap capital and didn't care enough to measure. I think it would be exceedingly difficult to put a dollar amount on the monthly output of most engineering teams due to the variety of tasks they cover, and the extreme challenge of knowing exactly why your customers are behaving a certain way. If you get 1000 more signups in a month, is that directly attributable to the engineering team's output? If anyone could have been concretely answering that question this whole time, I don't think they would have been ignoring the metrics.
> This does not mean that Slack’s engineering investment was wasted, because Slack also built enterprise sales infrastructure, compliance capabilities, data security practices, and organizational resilience that a fourteen-day prototype does not include.
The LLM-agent team argument also misses the core point that the engineering investment (which actually encompasses business decisions, design and much more than just programming) is what actually got Slack (or any other software product) to the point where is it is now and where it's going in the future and creating a snapshot of the current status is, while maybe not absolutely trivial, still just a tiny fraction of the progress made over the years.
The Anthropic C compiler experiment keeps coming up for a reason. Two weeks, 100KLOC, completely unmaintainable. Faster generation doesn't help if nobody in your org can reason about what was shipped. The bottleneck was never typing speed.
How could they not? When I penciled this out ~18 years ago, I included the amortized cost of all the interviews it took to hire a given engineer as well. It's not rocket surgery, as they say.
281 comments
> The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves.
Then I'd wager it's the same for the courses and workshop this guy is selling...an LLM can probably give me at least 75% of the financial insights for not even .1% of what this "agile coach" is asking for his workshops and courses.
Maybe the "agile coach LLM" can explain to the "coding LLM's" why they're too expensive, and then the "coding LLM's" can tell the "agile coach LLM" to take the next standby shift then, if he knows so much about code?
And then we actual humans can have a day off and relax at the pool.
With the annoying process people out of the picture, even reviewing vibeslop full time sounds kinda nice… Feet up, warm coffee, just me and my agents so I can swear whenever I need to. No meetings, no problems.
There's a 99% chance that the training materials on sale are equally replaceable with a prompt.
Getting these tools to "understand", or be able to generate good results in a codebase, is not a function of the number of agents or the time you let them run. Much rather, if the tools fail to produce anything useful after a few minutes, you can bet your ass that they're not going to work better after hours, or days. If they come up with a mess, and your reaction is to just let them work on it for a few days, I can confidently predict what you'll end up with.
They come close to grasping what we've learned about where these new tools are useful and where they aren't, only to end up falling for the pretty words these generators use to lipstick their turds. As right as they may be about the financial considerations, there are going to be some very uncomfortable bills to pay for those who share this belief in the magical abilities of LLMs.
For reasons which it would take a while to unpack, if is often the case that the best (or sometimes only) way to find out what programming actually needs to be done, is to program something that's not it, and then replace it. This may need to be done multiple times. Programming is only occasionally the final product, it is much more often the means of working through what it is that is actually needed. This is very difficult for the people who ask for the software, to understand, and it is quite often very difficult for the people doing the programming to understand.
Most of what is being done, during programming, is working through the problem space in a way which will make it more obvious what your mistakes are, in your understanding of the problem and what a solution would look like. Once you have arrived at that understanding, then there are a variety of ways to make what you need, but that is not the rate-limiting step.
> A messy codebase is still cheaper to send ten agents through than to staff a team around
People who say that haven't used today's agents enough or haven't looked closely at what they produce. The code they write isn't messy at all. It's more like asking the agent to build a building from floorplans and spec, and it produces everything in the right measurements and right colours and passes all tests. Except then you find out that the walls and beams are made of foam and the art is load-bearing. The entire construction is just wrong, hidden behind a nice exterior. And when you need to add a couple more floors, the agents can't "get through it" and neither can people. The codebase is bricked.
Today's agents are simply not capable enough - without very close and labour-intensive human supervision - to produce code that can last through evolution over any substantial period of time.
> A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today.
I’ve been on 2 failed projects that have been entirely AI generated and it’s not that agents slow down and you can just send more agents to work on projects for longer, it’s that they becoming completely unable to make any progress whatsoever, and whatever progress they do make is wrong.
> Software development is one of the most capital-intensive activities a modern company undertakes
The article is definitely written from a "high tech" industry lens. A mid-sized utility might spend $80-$150 million USD on IT capital projects in a year, but $2b on power pole maintenance. Utilities are a strong example, but any large enterprise manufacturing company is spending more on factory upgrades that programming.
> [...] built a functional replica of approximately 95% of Slack’s core product in fourteen days using LLM agents.
IT and Finance leadership and asset heavy companies are currently trying to wrap their head around the current economics of their 100+ SaaS contracts, and if it still makes sense with LLM powered developers. Can they hire developers in house to build the fraction of the tool they use from many of these companies, save on total cost and Opex?
I work with these companies a lot, and won't weigh in on the right decision. Bottom line "it depends" on many factors, some of which are not immediately obvious. The article still holds weight regardless of industries, but there is some nuance (talent availability, internal change cost, etc.) that also have to be considered.
The copy doesn’t even remotely grasp the scale of what the actual Slack sofware does in terms of scale, relaiability, observability, monitorability, maintability and pretty sure also functionality.
Author only writes about the non-dev work as difference, which seems like he doesn’t know what he’s talking about in all, and what running an application at that scale actually means.
This "clone" doesn’t get you any closer to an actualy Slack copy than a white piece of paper
Of late, I've come across a lot of ideas from Rory Sutherland and my conclusion from listening to his ideas is that there are some people, who're obsessed with numbers, because to them it's a way to find certainty and win arguments. He calls them "Finance People" (him being a Marketing one). Here's an example
"Finance people don’t really want to make the company money over time. They just thrive on certainty and predictability. They try to make the world resemble their fantasy of perfect certainty, perfect quantification, perfect measurement.
Here’s the problem. A cost is really quantifiable and really visible. And if you cut a cost, it delivers predictable gains almost instantaneously."
> Choosing to spend three weeks on a feature that serves 2% of users is a €60,000 decision.
I'd really want to hire the Oracle of a PM/ Analyst that can give me that 2% accurately even 75% of the time, and promise nothing non-linear can come from an exercise.
I suspect this is most apparent on things like meeting culture. Something happens and all of a sudden there is another recurring meeting on the calendar, with 15 attendee's, costing x dollars in wages, that produces no value for the customers because the lesson was already learned.
Or when reacting to an incident of some sort, it's so easy to have a long list of action items that may theoretically improve the situation, but in reality are incredibly expensive for the value they produce (or the risks they reduce). It's too easy to say, we'll totally redesign the system to avoid said problem. And what worries me, is often those very expansive actions, then cause you to overlook realistic but small investments that move the needle more than you would think.
And as a hot topic I also think the costs are an input into taking on tech debt. I know we all hate tech debt with a passion, but honestly, I think of it as a tool that can be wielded responsibly or irresponsibly. But if we don't know what our attention costs, we're going to have difficulty making the responsible choices about when and where to take on this debt. And then if we're not conscious about the debt, when it comes do it stings so much harder to pay down.
Unfortunately, even with all the management techniques in the world, there are just some projects that are impossible to care about. There’s simply a significantly lower cap on productivity on these projects.
Its like min-maxing a Diablo build where you want the quality of the product to be _just_ above the "acceptable" threshold but no higher because that's wasting money. Then, you're free to use all remaining points to spec into revenue.
It's there to deal with cross concerns that would be not only time consuming but could be business threatening, and in some cases, you keep there more expensive engineers that ensure that certain critical things are done right.
Too much snake oil for my taste.
I do agree with his thesis in the middle, about how the ZIRP decade and the cultures that were born from that period were outrageous and cannot survive the current era. It's a brave new world, and it's not because of AI. It's because there's just not enough money flowing anymore, and what little is left is sucked up by the big boys (AI).
The direct and indirect financial impact of technical decisions are indeed hard to measure. But some technical decisions definitely have greater financial impact than others. Even if it's hard to precisely quantify the financial costs/benefits of every decision. It is possible to order them relatively. X is likely to make more money than Y. So we do X first and Y later.
There is a significant amount of chance involved in whether a product/feature will even make money at all. So even good plans with measurably positive expected value could end up losing money.
Just because it's impossible to be 100% certain of the outcome of any decision. Doesn't mean we should throw the baby out with the bathwater.
> There is no cohort of senior product leaders who developed their judgment in conditions where their teams were expected to demonstrate financial return, because those conditions did not exist during the years when that cohort was learning the craft.
There totally is such a cohort. There are plenty of bootstrapped companies or startups that took only an angel round and did not benefit from the low rate environment, in fact they suffered because of the very high price of SWE labor. But those engineering managers exist and are out there right now still building efficiently, quietly growing, passionately serving customers, and keeping a close eye on the bottom line and risks because that’s their livelihood.
> The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves.
I keep seeing this assumption that "unmanageable" caps out at "kinda hard to reason about", and anyone with experience in large codebases can tell you that's not so. There are software components I own today which require me to routinely explain to junior engineers (and indeed to my own instances of Claude) why their PR is unsound and I won't let them merge it no matter how many tests they add.
I do like this bit though:
> A large codebase also carries maintenance costs that grow over time as the system becomes more complex, more interconnected, and more difficult to change safely. Every engineer added to maintain it increases coordination costs, introduces new dependencies, and adds to the organizational weight that slows decision-making. The asset and the liability exist simultaneously, and for most of the past twenty years, the financial environment masked the liability side of that equation.
And the insight that LLMs are exposing this reality is absolutely true. The funny thing is they are exposing it by accelerating both good and bad engineering practices. Teams with good engineering judgement will move faster than ever with fewer people, and teams with bad engineering judgment will bury themselves in technical debt so fast the wheels will come off.
For me, running an engineering org is primarily about talent acquisition and empowering those ICs with judgment to move quickly. How well systems and teams scale depends on the domain, product, and how it allows you to decouple things. With the right talent and empowerment there are often creative ways to make product and system tradeoffs and iterate quickly to change the shape of ROI. Any mapping to financial metrics is a hugely lossy operation that can't account for such changes. It might work in mature companies that are ossified and in the second half of their lifecycle, but in growing companies I think it's fundamentally misguided would amount to empowering the wrong people.
The argument to always go for the biggest return works OK for the first few years of high growth (though the timeline is probably greatly compressed the more you use AI), but it turns into a kind of quicksand later.
> The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. […] The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves.
Maybe there’s some new paradigm that makes this true. But it doesn’t seem obviously true to me.
Humans make the best code long term when everything orbits a vision of the underlying problem space.
LLMs seem to only consider the deeper problem space when I explicitly flag it for them, otherwise they write “good enough for this situation” type code. And that stack of patches type code is exactly how the code becomes messy and complicated in the first place.
Why don't we instead focus our energies on the customer and then work our way backward into the technology. There are a lot of ways to solve problems these days. But first you want to make sure you are solving the right problem. Whether or not your solution represents a "liability" or an "asset" is irrelevant if the customer doesn't even care about it.
Not sure. Because it totally depends on what they do instead. Are they utilizing two hours more every week now doing meaningful work? Or are they just taking things a bit more easy? Very hard to determine and it just makes it harder to reason about the costs and wins in these cases.
> their value is, at least in principle, calculable
I feel this article does not spend enough time investigating the challenges of directly measuring the financial output of an engineering team. I agree it is theoretically possible, but I don't think the full answer is that people got lazy on cheap capital and didn't care enough to measure. I think it would be exceedingly difficult to put a dollar amount on the monthly output of most engineering teams due to the variety of tasks they cover, and the extreme challenge of knowing exactly why your customers are behaving a certain way. If you get 1000 more signups in a month, is that directly attributable to the engineering team's output? If anyone could have been concretely answering that question this whole time, I don't think they would have been ignoring the metrics.
> This does not mean that Slack’s engineering investment was wasted, because Slack also built enterprise sales infrastructure, compliance capabilities, data security practices, and organizational resilience that a fourteen-day prototype does not include.
The LLM-agent team argument also misses the core point that the engineering investment (which actually encompasses business decisions, design and much more than just programming) is what actually got Slack (or any other software product) to the point where is it is now and where it's going in the future and creating a snapshot of the current status is, while maybe not absolutely trivial, still just a tiny fraction of the progress made over the years.
> Most engineers do not know this number.
How could they not? When I penciled this out ~18 years ago, I included the amortized cost of all the interviews it took to hire a given engineer as well. It's not rocket surgery, as they say.
Money can be exchanged for goods and services.