When we (the engineering team I work on) started using agents more seriously we were worried about this: that we'd speed up coding time but slow down review time and just end up increasing cycle time.
So far there's no obvious change one way or the other, but it hasn't been very long and everyone is in various states of figuring out their new workflows, so I don't think we have enough data for things to average out yet.
We're finding cases where fast coding really does seem to be super helpful though:
* Experimenting with ideas/refactors to see how they'll play out (often the agent can just tell you how it's going to play out)
* Complex tedious replacements (the kind of stuff you can't find/replace because it's contextual)
* Times where the path forward is simple but also a lot of work (tedious stuff)
* Dealing with edge cases after building the happy path
* EDIT: One more huge one I would add: anywhere where the thing you're adding is a complete analogy of another branch/PR the agent seems to do great at (which is like a "simple but tedious" case)
The single biggest potential productivity gain though I think is being able to do something else while the agent is coding, like you can go review a PR and then when you come back check out what the agent produced.
I would say we've gone from being extremely skeptical to cautiously excited. I think it's far fetched that we'll see any order of magnitude differences, we're hoping for 2x (which would be huge!).
> The single biggest potential productivity gain though I think is being able to do something else while the agent is coding, like you can go review a PR and then when you come back check out what the agent produced.
I've already passed through this phase and have given up on it. I'm sure everyone's experience will vary, but I just find it introduces either sufficiently more context switching or detracts sufficiently enough mental engagement that I end up introducing more errors, feeling miserable, or just straight up losing productivity and focus. This type of workflow is only viable for me if the cost of mistakes is low, the surface area for changes is small, or the mental context is the same between activities.
The expectation that this is a serviceable workflow—I fear, and am experiencing—will ultimately just create more compressed timelines for everything, while quality, design, and job satisfaction will drop. Yes the code can be written while I look at a PR, but if it's a non-trivial amount of code or a non-trivial PR (which tends to become more frequent as more code generation and larger refactors are happening) then I'm just context switching between tasks I need to constantly re-zone in on, which is less gratifying and more volatile in a way that just hurts my mind and soul and money doesn't change in a meaningful way.
That's not to say I'm not using them or seeing no productivity gains, but I'm not reclaiming that much time due to being able to anything concurrently, it's mostly reclaiming time I'd otherwise have procrastinated on something.
For years I worked at a large company with so many blockers for everything that I always worked like this all the time - have 5 projects so when one becomes blocked for external deps, you have one to pull out and work on. There is a context switch (which lead me to context preservation habits like checking everything I write into git; using tmux so that the context is sitting around in the bash history of the shell where I was working, that sort of thing; lots of org files and Mac stickies with TODOs etc).
I still do this, and don't really think it's avoidable, but when the expectation around compressed timelines because of the imaginary ability to rapidly do synchronous non-trivial tasks in parallel, that's both things get sone poorly.
Feels akin to something like driving in stop and go traffic while playing chess with a passenger who's shit talking me.
I've tried the "4 agents running at the same time in different projects/features" and I felt literally dizzy. I still do the "check something else while the agent runs", and I often forget about that terminal window for many minutes, only to remember about it several tasks later.
> The single biggest potential productivity gain though I think is being able to do something else while the agent is coding, like you can go review a PR and then when you come back check out what the agent produced
Ugh, sounds awful. Constantly context switching and juggling multiple tasks is a sure-fire way to burn me out.
The human element in all of this never seems to be discussed. Maybe this will weed out those that are unworthy of the new process but I simply don't want to be "on" all the time. I don't want to be optimized like this.
Often when you are solving a problem, you are never solving a single problem at a time. Even in a single task, there are 4-5 tasks hidden. you could easily put agent to do one task while you do another.
Ask it to implement a simple http put get with some authentication and interface and logs for example, while you work out the protocol.
Insightful and helpful to peer into another company's experience. Mostly agree with your highlighted points.
> The single biggest potential productivity gain though I think is being able to do something else while the agent is coding, like you can go review a PR and then when you come back check out what the agent produced.
This is where unnerving exhaustion comes from though.
I know myself to be on the side of craftsmen. It does takes tons and tons of time to code, but I didn't get exhausted the way I do with AI. AI is productive, I am pro-AI. But boy is it a different kind of work beast.
> Experimenting with ideas/refactors to see how they'll play out (often the agent can just tell you how it's going to play out)
This has helped me a lot. Normally I'd feel really attached to big refactors because of sunk costs, but when AI does a huge refactor it's easier to honestly decide that it wasn't worth it and unnecessarily increased complexity.
> The bottleneck is understanding the problem. No amount of faster typing fixes that.
Why not? Why can't faster typing help us understand the problem faster?
> When you speed up code output in this environment, you are speeding up the rate at which you build the wrong thing.
Why can't we figure out the right thing faster by building the wrong thing faster? Presumably we were gonna build the wrong thing either way in this example, weren't we?
I often build something to figure out what I want, and that's only become more true the cheaper it is to build a prototype version of a thing.
> You will build the wrong feature faster, ship it, watch it fail, and then do a retro where someone says "we need to talk to users more" and everyone nods solemnly and then absolutely nothing changes.
As human developers, I think we're struggling with "letting go" of the code. The code we write (or agents write) is really just an intermediate representation (IR) of the solution.
For instance, GCC will inline functions, unroll loops, and myriad other optimizations that we don't care about. But when we review the ASM that GCC generates (we don't) we are not concerned with the "spaghetti" and the "high coupling" and "low cohesion". We care that it works, and is correct for what it is supposed to do. And that it is a faithful representation of the solution that we are trying to achieve.
Source code in a higher-level language is not really different anymore. Agents write the code, maybe we guide them on patterns and correct them when they are obviously wrong, but the code is merely the work-item artifact that comes out of extensive specification, discussion, proposal review, and more review of the reviews.
A well-guided, iterative process and problem/solution description should be able to generate an equivalent implementation whether a human is writing the code or an agent.
Yeah, we again have a solution (LLMs) in search of problems.
Proper approach to speeding things up would be to ask "What are the limiting factors which stops us from X, Y, Z".
--
This situation of management expecting things to become fast because of AI is "vibe management". Why to think, why to understand, why to talk to your people if you saw an excited presentation of the magic tool and the only thing you need to do is to adopt it?..
Companies genuinely don't want good code. Individual teams just get measured by how many things they push around. An employee warning that something might not work very well is going to get reprimanded as "down in the weeds" or "too detail oriented," etc. I didn't understand this for a while, but internal actors inside of companies really just want to claim success.
> Someone approves a PR they didn’t really read. We’ve all done it (don’t look at me like that). It merges. CI takes 45 minutes, fails on a flaky test, gets re-run, passes on the second attempt (the flaky test is fine, it’s always fine, until it isn’t and you’re debugging production at 2am on a Saturday in your underwear wondering where your life went wrong. Ask me how I know… actually, don’t). The deploy pipeline requires a manual approval from someone who’s in a meeting about meetings. The feature sits in staging for three days because nobody owns the “get it to production” step with any urgency.
This is the company I (soon no longer) work at (anyone hiring?).
The thing is that they don’t even allow the use of AI. I’ve been assured that the vast majority of the code was human-written. I have my doubts but the timeline does check out.
Apart from that, this article uses a lot of words to completely miss the fact that (A) “use agents to generate code” and “optimize your processes” are not mutually exclusive things; (B) sometimes, for some tickets - particularly ones stakeholders like to slide in unrefined a week before the sprint ends - the code IS the bottleneck, and the sooner you can get the hell off of that trivial but code-heavy ticket, the sooner you can get back to spending time on the actual problems; and (C) doing all of this is a good idea completely regardless of whether you use LLMs or not; and anyone who doesn’t do any of it and thinks the solution is to just hire more devs will run into the exact same roadblocks.
I'm a solo dev. In fact I'm hardly a dev; it's just a helpful skill. Code writing speed IS a problem, because it takes valuable time away from other tasks. A bit like doing the dishes.
I just set up Claude Code tonight. I still read and understand every line, but I don't need to Google things, move things around and write tests myself. I state my low-level intent and it does the grunt work.
I'm not going to 10x my productivity, but it'll free up some time. It's just a labour-saving technology, not a panacea. Just like a dishwasher.
How do people ensure that AI don't produce subtle and stupid mistakes that humans usually don't make, like the one in Amazon that deleted the entire production deployment?
When a person writes code, the person reasons out the code multiple times, step by step, so that they don't make at least stupid or obvious mistakes. This level of close examination is not covered in code review. And arguably this is why we can trust more on human-written code than AI-produced, even though AI can probably write better code at smaller scale.
In contrast, Amazon asked senior engineers to review AI-generated code before merging them. But the purpose of code review was never about capturing all the bugs -- that is the job of test cases, right? Besides, the more senior an engineer is in Amazon, the more meetings they go to, and the less context they have about code. How can they be effective in code review?
I read this book last year and this application is spot on. There is a point in the narrative when the company automates a step in their manufacturing using an expensive machine and it has the effect developed here: the next step in the process is backed up further.
The points specific to software where it might not even be producing in-spec is also very good.
Comments that cite the solo dev/prototype case are of course not what this is getting at, but it's one good use of quick generation.
I would extend this article by saying what The Goal says, namely that the goal of every firm is to make money, and everything is intermediate to that. So whether or not software architecture is grade-A or grade-C, it's only ever in this subservient role to the firm's goal.
One of the main reasons I like vim is that it enables me to navigate code very fast, that the edits are also quick when I've decided on them is a nice convenience but not particularly important.
Same goes for the terminal, I like that it allows me to use a large directory tree with many assorted file types as if it was a database. I.e. ad hoc, immediate access to search, filter, bulk edits and so on. This is why one of the first things I try to learn in a new language is how to shell out, so I can program against the OS environment through terminal tooling.
Deciding what and how to edit is typically an important bottleneck, as are the feedback loops. It doesn't matter that I can generate a million lines of code, unless I can also with confidence say that they are good ones, i.e. they will make or save money if it is in a commercial organisation. Then the organisation also needs to be informed of what I do, it needs to give me feedback and have a sound basis to make decisions.
Decision making is hard. This is why many bosses suck. They're bad at identifying what they need to make a good decision, and just can't help their underlings figure out how to supply it. I think most developers who have spent time in "BI" would recognise this, and a lot of the rest of us have been in worthless estimation meetings, retrospectives and whatnot where we ruminate a lot of useless information and watch other people do guesswork.
A neat visualisation of what a system actually contains and how it works is likely of much bigger business value than code generated fast. It's not like big SaaS ERP consultancy shops have historically worried much about how quickly the application code is generated, they worry about the interfaces and correctness so that customers or their consultants can make adequate unambiguous decisions with as little friction as possible.
Code-writing speed is not a bottleneck when the stakes are high. Sometimes, it's better to slow down, plan ahead, and consider the consequences because the cost of a failed iteration is too great.
Take the way AI is being developed as an example. People rush to build giant agents in giant datacenters that are aligned to giant corporations and governments. They're building the agentic organism equivalent of machiavellian organizations, even though they'd be better off building digital humans that are aligned to individual humans that run on people's gaming PCs at home. They will find out that the former is the wrong architecture, but the cost of that failed iteration is the future of human civilization, and nobody gets a second try.
Of course, this is an extreme example on one end of the scale. On the other end, it wouldn't matter at all if you're building a small game for yourself as a weekend project with no users to please or societal impacts to consider.
My problem when writing code is mainly executive dysfunction; I constantly succumb to the temptation to take the easy way and do it properly later, and later never comes. For some reason, using a coding agent seems to alleviate this. Things get done the way I think they should be done, not just in a way that's "good enough for now."
I think one of the biggest gains of AI is the second order effects. And maybe a bit in the third order as well. When coding with Claude, I can focus on the architecture, the big picture of the implementation, while Claude takes care of the details. Every function for which I don't have to remember the order of the arguments anymore, is a little bit extra brain power I can dedicate to thinking about the solution. I can focus on what matters, and I can focus for longer. I find that remembering argument order is much more tiresome than thinking through complex interactions(because the latter is fun, while the former is boring)
Then there's the speedup. A smaller team can now achieve what a larger team was needed for before. This means less communication overhead, in theory fewer and/or shorter meetings. Which all translates to me spending more time and more energy on thinking about the solution. Which is what matters.
I can really relate to this. At the same time I’m not convinced cycle time always trumps throughput. Context switching is bad, and one solution to it is time boxing, which basically means there will be some wait time until the next box of time where the work is picked up. Doing time boxing properly lowers context switching, increases throughput but also increases latency (cycle time). It’s a trade-off. But of course maybe time boxing isn’t the best solution to the problem of context switching, maybe it’s possible to figure out a way to have the cookie and eat it. And maybe different circumstances require a different balance between latency and throughput.
> "When you optimise a step that is not the bottleneck, you don't get a faster system. You get a more broken one."
> Think about it mechanically. If station A produces widgets faster but station B [...]
Tried this at numerous companies, small and large.
The engineers get this, or are willing to learn. Some (by no means most) scrum/agile leads get it.
The problem is the 'product class' don't get it, aren't interested and by-and-large don't have the aptitude to understand. Try tp explain cycle time, or cumulative flow diagrams to a Product Manage, Product Owner, Service Owner and they most often just brush it away as 'a technical thing'
The problem only gets worse as the Peter principle begins to kick in and thin out the talent towards the top end of the org.
It's unfair to characterize AI as 'code writing / completion' - it's at minimum 1/4 layer of abstraction above that - and even just 'at that' - it's useful.
So 'writing helper' + 'research helper' + 'task helper' alone is amazing and we are def beyond that.
Even side features like 'do this experiment' where you can burn a ton of tokens to figure things out ... so valuable.
These are cars in the age of horses, it's just a matter of properly characterizing the cars.
The product world has a somewhat accepted idea to prototype to production framework. AI code generation is great in that world because there is a process to discover what problem to solve.
System maintenance doesn't have a clearly defined "what problem to solve" path. Maybe it's smallest deployable increment to confirmed value delivery. But that's harder to systematize. And AI code generation is probably not a helpful tool here.
It’s definitely going to create a lot of problems in orgs that already have an incomplete or understaffed dev pipeline, which happen to often be the ones where executive leadership is already disconnected and not aware of what the true bottlenecks are, which also happen to often be the ones that get hooked by vendor slide decks…
212 comments
So far there's no obvious change one way or the other, but it hasn't been very long and everyone is in various states of figuring out their new workflows, so I don't think we have enough data for things to average out yet.
We're finding cases where fast coding really does seem to be super helpful though:
* Experimenting with ideas/refactors to see how they'll play out (often the agent can just tell you how it's going to play out)
* Complex tedious replacements (the kind of stuff you can't find/replace because it's contextual)
* Times where the path forward is simple but also a lot of work (tedious stuff)
* Dealing with edge cases after building the happy path
* EDIT: One more huge one I would add: anywhere where the thing you're adding is a complete analogy of another branch/PR the agent seems to do great at (which is like a "simple but tedious" case)
The single biggest potential productivity gain though I think is being able to do something else while the agent is coding, like you can go review a PR and then when you come back check out what the agent produced.
I would say we've gone from being extremely skeptical to cautiously excited. I think it's far fetched that we'll see any order of magnitude differences, we're hoping for 2x (which would be huge!).
> The single biggest potential productivity gain though I think is being able to do something else while the agent is coding, like you can go review a PR and then when you come back check out what the agent produced.
I've already passed through this phase and have given up on it. I'm sure everyone's experience will vary, but I just find it introduces either sufficiently more context switching or detracts sufficiently enough mental engagement that I end up introducing more errors, feeling miserable, or just straight up losing productivity and focus. This type of workflow is only viable for me if the cost of mistakes is low, the surface area for changes is small, or the mental context is the same between activities.
The expectation that this is a serviceable workflow—I fear, and am experiencing—will ultimately just create more compressed timelines for everything, while quality, design, and job satisfaction will drop. Yes the code can be written while I look at a PR, but if it's a non-trivial amount of code or a non-trivial PR (which tends to become more frequent as more code generation and larger refactors are happening) then I'm just context switching between tasks I need to constantly re-zone in on, which is less gratifying and more volatile in a way that just hurts my mind and soul and money doesn't change in a meaningful way.
That's not to say I'm not using them or seeing no productivity gains, but I'm not reclaiming that much time due to being able to anything concurrently, it's mostly reclaiming time I'd otherwise have procrastinated on something.
Feels akin to something like driving in stop and go traffic while playing chess with a passenger who's shit talking me.
> The single biggest potential productivity gain though I think is being able to do something else while the agent is coding, like you can go review a PR and then when you come back check out what the agent produced
Ugh, sounds awful. Constantly context switching and juggling multiple tasks is a sure-fire way to burn me out.
The human element in all of this never seems to be discussed. Maybe this will weed out those that are unworthy of the new process but I simply don't want to be "on" all the time. I don't want to be optimized like this.
Ask it to implement a simple http put get with some authentication and interface and logs for example, while you work out the protocol.
no.
> The single biggest potential productivity gain though I think is being able to do something else while the agent is coding, like you can go review a PR and then when you come back check out what the agent produced.
This is where unnerving exhaustion comes from though.
I know myself to be on the side of craftsmen. It does takes tons and tons of time to code, but I didn't get exhausted the way I do with AI. AI is productive, I am pro-AI. But boy is it a different kind of work beast.
> Experimenting with ideas/refactors to see how they'll play out (often the agent can just tell you how it's going to play out)
This has helped me a lot. Normally I'd feel really attached to big refactors because of sunk costs, but when AI does a huge refactor it's easier to honestly decide that it wasn't worth it and unnecessarily increased complexity.
> The bottleneck is understanding the problem. No amount of faster typing fixes that.
Why not? Why can't faster typing help us understand the problem faster?
> When you speed up code output in this environment, you are speeding up the rate at which you build the wrong thing.
Why can't we figure out the right thing faster by building the wrong thing faster? Presumably we were gonna build the wrong thing either way in this example, weren't we?
I often build something to figure out what I want, and that's only become more true the cheaper it is to build a prototype version of a thing.
> You will build the wrong feature faster, ship it, watch it fail, and then do a retro where someone says "we need to talk to users more" and everyone nods solemnly and then absolutely nothing changes.
I guess because we're just cynical.
For instance, GCC will inline functions, unroll loops, and myriad other optimizations that we don't care about. But when we review the ASM that GCC generates (we don't) we are not concerned with the "spaghetti" and the "high coupling" and "low cohesion". We care that it works, and is correct for what it is supposed to do. And that it is a faithful representation of the solution that we are trying to achieve.
Source code in a higher-level language is not really different anymore. Agents write the code, maybe we guide them on patterns and correct them when they are obviously wrong, but the code is merely the work-item artifact that comes out of extensive specification, discussion, proposal review, and more review of the reviews.
A well-guided, iterative process and problem/solution description should be able to generate an equivalent implementation whether a human is writing the code or an agent.
Proper approach to speeding things up would be to ask "What are the limiting factors which stops us from X, Y, Z".
--
This situation of management expecting things to become fast because of AI is "vibe management". Why to think, why to understand, why to talk to your people if you saw an excited presentation of the magic tool and the only thing you need to do is to adopt it?..
> Someone approves a PR they didn’t really read. We’ve all done it (don’t look at me like that). It merges. CI takes 45 minutes, fails on a flaky test, gets re-run, passes on the second attempt (the flaky test is fine, it’s always fine, until it isn’t and you’re debugging production at 2am on a Saturday in your underwear wondering where your life went wrong. Ask me how I know… actually, don’t). The deploy pipeline requires a manual approval from someone who’s in a meeting about meetings. The feature sits in staging for three days because nobody owns the “get it to production” step with any urgency.
This is the company I (soon no longer) work at (anyone hiring?).
The thing is that they don’t even allow the use of AI. I’ve been assured that the vast majority of the code was human-written. I have my doubts but the timeline does check out.
Apart from that, this article uses a lot of words to completely miss the fact that (A) “use agents to generate code” and “optimize your processes” are not mutually exclusive things; (B) sometimes, for some tickets - particularly ones stakeholders like to slide in unrefined a week before the sprint ends - the code IS the bottleneck, and the sooner you can get the hell off of that trivial but code-heavy ticket, the sooner you can get back to spending time on the actual problems; and (C) doing all of this is a good idea completely regardless of whether you use LLMs or not; and anyone who doesn’t do any of it and thinks the solution is to just hire more devs will run into the exact same roadblocks.
I just set up Claude Code tonight. I still read and understand every line, but I don't need to Google things, move things around and write tests myself. I state my low-level intent and it does the grunt work.
I'm not going to 10x my productivity, but it'll free up some time. It's just a labour-saving technology, not a panacea. Just like a dishwasher.
When a person writes code, the person reasons out the code multiple times, step by step, so that they don't make at least stupid or obvious mistakes. This level of close examination is not covered in code review. And arguably this is why we can trust more on human-written code than AI-produced, even though AI can probably write better code at smaller scale.
In contrast, Amazon asked senior engineers to review AI-generated code before merging them. But the purpose of code review was never about capturing all the bugs -- that is the job of test cases, right? Besides, the more senior an engineer is in Amazon, the more meetings they go to, and the less context they have about code. How can they be effective in code review?
The points specific to software where it might not even be producing in-spec is also very good.
Comments that cite the solo dev/prototype case are of course not what this is getting at, but it's one good use of quick generation.
I would extend this article by saying what The Goal says, namely that the goal of every firm is to make money, and everything is intermediate to that. So whether or not software architecture is grade-A or grade-C, it's only ever in this subservient role to the firm's goal.
Same goes for the terminal, I like that it allows me to use a large directory tree with many assorted file types as if it was a database. I.e. ad hoc, immediate access to search, filter, bulk edits and so on. This is why one of the first things I try to learn in a new language is how to shell out, so I can program against the OS environment through terminal tooling.
Deciding what and how to edit is typically an important bottleneck, as are the feedback loops. It doesn't matter that I can generate a million lines of code, unless I can also with confidence say that they are good ones, i.e. they will make or save money if it is in a commercial organisation. Then the organisation also needs to be informed of what I do, it needs to give me feedback and have a sound basis to make decisions.
Decision making is hard. This is why many bosses suck. They're bad at identifying what they need to make a good decision, and just can't help their underlings figure out how to supply it. I think most developers who have spent time in "BI" would recognise this, and a lot of the rest of us have been in worthless estimation meetings, retrospectives and whatnot where we ruminate a lot of useless information and watch other people do guesswork.
A neat visualisation of what a system actually contains and how it works is likely of much bigger business value than code generated fast. It's not like big SaaS ERP consultancy shops have historically worried much about how quickly the application code is generated, they worry about the interfaces and correctness so that customers or their consultants can make adequate unambiguous decisions with as little friction as possible.
Take the way AI is being developed as an example. People rush to build giant agents in giant datacenters that are aligned to giant corporations and governments. They're building the agentic organism equivalent of machiavellian organizations, even though they'd be better off building digital humans that are aligned to individual humans that run on people's gaming PCs at home. They will find out that the former is the wrong architecture, but the cost of that failed iteration is the future of human civilization, and nobody gets a second try.
Of course, this is an extreme example on one end of the scale. On the other end, it wouldn't matter at all if you're building a small game for yourself as a weekend project with no users to please or societal impacts to consider.
Then there's the speedup. A smaller team can now achieve what a larger team was needed for before. This means less communication overhead, in theory fewer and/or shorter meetings. Which all translates to me spending more time and more energy on thinking about the solution. Which is what matters.
> "When you optimise a step that is not the bottleneck, you don't get a faster system. You get a more broken one." > Think about it mechanically. If station A produces widgets faster but station B [...]
cries in Factorio
The engineers get this, or are willing to learn. Some (by no means most) scrum/agile leads get it.
The problem is the 'product class' don't get it, aren't interested and by-and-large don't have the aptitude to understand. Try tp explain cycle time, or cumulative flow diagrams to a Product Manage, Product Owner, Service Owner and they most often just brush it away as 'a technical thing'
The problem only gets worse as the Peter principle begins to kick in and thin out the talent towards the top end of the org.
> The Goal ... it's also the most useful business book you'll ever read that's technically fiction
factorio ... it's also the most useful engineering homework that's technically a game
So 'writing helper' + 'research helper' + 'task helper' alone is amazing and we are def beyond that.
Even side features like 'do this experiment' where you can burn a ton of tokens to figure things out ... so valuable.
These are cars in the age of horses, it's just a matter of properly characterizing the cars.
Btw: https://playcode.io
System maintenance doesn't have a clearly defined "what problem to solve" path. Maybe it's smallest deployable increment to confirmed value delivery. But that's harder to systematize. And AI code generation is probably not a helpful tool here.
Expedience is the enemy of quality.
Want proof? Everyone as a result of “move fast and break things” from 5-10 years ago is a pile of malfunctioning trash. This is not up for debate.
This is simply an observation. I do not make the rules. See my last submission for some CONSTRUCTIVE reading.
Bye for now.