The human cost of 10x: How AI is physically breaking senior engineers

[−] zthrowaway 31d ago

Can definitely attest to this. The frequency of outages at my company have increased drastically the past year, especially ever since incorporating agentic development. I’m seeing all of the dev best practices go out the window. We have a few vibe coders that are posting 15-30 PR’s per day. It’s way too much for us to review. We’re not a big shop. I think we’re going to have to hire more people just to review code across the industry. And those people will have to know how to actually write software otherwise what are they even reviewing. Maybe the models will get so good they never make a mistake. Doubt it.

[−] bensyverson 31d ago

I wonder if the PR workflow is just unsustainable in the agentic era. Rather than review every new feature or bug fix, we would depend on good test coverage, and hold developers accountable for what they ship.

The result might be more faulty code getting merged, but if you already have outages and can't review every PR, is there currently a meaningful benefit to the PR workflow?

[−] dwattttt 31d ago

This is the "if you're already letting faults through, why not give up trying to stop faults?" approach.

[−] bensyverson 31d ago

The alternative might be "what if we could get the genie back into the bottle?"

We know some people are using LLMs to evaluate PRs, the only question is who, and how strong the incentive is for them to give up.

[−] 01HNNWZ0MV43FF 31d ago

Diogenes carrying a lamp, looking for good test coverage

[−] palmotea 31d ago

> I wonder if the PR workflow is just unsustainable in the agentic era. Rather than review every new feature or bug fix, we would depend on good test coverage, and hold developers accountable for what they ship.

I think what you're describing is setting up the human as the fall guy for the machine.

[−] dhedlund 31d ago

This reminds me a bit of monoliths vs microservices. People would see microservices as the next new shiny thing and bring it with them to their next job, or read a great blog post that sounds great in theory, but falls apart in practice. People would see it as as purely architectural decision. But the reality was that you had to have the organizational structure to support that development model or you'd find out that it just doesn't scale the way you expect and introduces its own sets of problems. My experience is that most teams that didn't have large orgs got bogged down by the weight of microservices (or things called "microservices"). It required a lot of tooling and orchestration to manage. But there was this promise that you could easily just rewrite that microservice from scratch or change languages and nobody would notice or care.

LLM-generated code feels the same. Reviewing LLM-generated code when it's in the context of a monolith is more taxing than reviewing it in the context of the microservice; the blast radius is larger and the risk is greater, as you can make decisions around how important that service actually is for system-wide stability with microservices. You can effectively not care for some services, and can go back and iterate or rewrite it several times over. But more importantly, the organizational structures that are needed to support microservice like architectures effectively also feel like the organizational structures that are needed to support LLM-generated codebases effectively; more silo-ing, more ownership, more contract and spec-based communication between teams, etc. Teams might become one person and an agent in that org structure. But communication and responsibilities feel like they're require something similar to what is needed to support microservices...just that services are probably closer in size to what many companies end up building when they try to build microservices.

And then there are majestic monoliths, very well curated monoliths that feel like a monorepo of services with clear design and architecture. If they've been well managed, these are also likely to work well for agents, but still suffer the same cognitive overhead when reviewing their work because organizationally people working on or reviewing code for these projects are often still responsible for more than just a narrow slice, with a lot of overlap with other devs, requiring more eyes and buy-in for each change as a result.

The organizational structures that we have in place for today might be forced to adapt over time, to silo in ways that ownership and responsibility narrow to fit within what we can juggle mentally. Or they'll be forced to slow down an accept the limitations of the organizational structure. Personal projects have been the area that people have had a lot of success with for LLMs, which feels closer to smaller siloed teams. Open-source collaboration with LLM PRs feels like it falls apart for the same cognitive overhead reasons as existing team structures that adopt AI.

[−] PradeetPatel 31d ago

The proposed industry solution is to use agents to review PRs, as not to slow down the velocity of delivery...

My current workplace is going through a major "realignment" exercise to replace as many testers with agents as humanely possible, which proved to be a challenge when the existing process is not well documented.

[−] teaearlgraycold 31d ago

People pushing dozens of PRs per day need to learn to prioritize tasks, and balance a bit more towards quality over quantity.

[−] Madmallard 31d ago

Sounds like people need to speak up to management

[−] sharts 31d ago

Maybe it’s time to have multiple agents and models review the PRs and also provide context for easier human review. That and lots more focus on robust testing.

There’s no way velocity will decrease now that upper management is obsessed with AI.

[−] pants2 31d ago

I really think that software in general is getting buggier, with ChatGPT/Claude being some of the buggiest software I use. I constantly run into quality issues there and I've reported at least a dozen bugs to ChatGPT this year. One kicker I found recently was that Codex PR Reviews, once turned on for a repo, cannot be turned off - I got escalated to engineering who confirmed that they forgot to add a feature to disable code reviews.

[−] K0balt 30d ago

Honestly by the time it gets to review it should be rock solid, so the only thing the reviewer has to think about is the big picture and never “does this actually do what it’s supposed to without abusing any of the interfaces”. Vibe coding makes solid validation, testing, and documentation trivial. The onus of proving your code is good needs to shift downward, not upward. And straight up vibing it is absolutely a terrible idea for anything other than a demo or a simple tool.

[−] ok_dad 31d ago

I love it. I was getting burnt out due to ADHD or autism burnout but with AI tooling I’m able to work a full week without burnout. I think the kind of burnout I get is helped with these tools, but since I’m not neurotypical it’s different from the burnout people are getting from doing too much.

I do see “task expansion” happening often though. If I can do the full feature rather than doing baby steps I’ll often do that now, because wrangling code is easier.

[−] cadamsdotcom 31d ago

You can write your own linters for every dumb AI mistake, add them as pre-commit checks, and never see that mistake in committed code ever again.. it’s really empowering.

You don’t even have to code the linters yourself. The agent can write a python script that walks the AST of the code, or uses regex, or tries to run it or compile it. Non zero exit code and a line number and the agent will fix the problem then and rerun the linter and loop until it passes.

Lint your architecture - block any commit which directly imports the database from a route handler. Whatever the coding agent thinks - ask it for recommendations for an approach!

Get out of the business of low level code review. That stuff is automatable and codifiable and it’s not where you are best poised to add value, dear human.

[−] TuringNYC 31d ago

I can attest to this. Ultimately I dont think it is possible to 10x output systems with AI and actually keep the traditional quality controls (yet.)

IMHO you just need two stacks -- systems where you can play fast and loose and 10x output. And systems where quality matters where you can perhaps 1.5 or 2x. That is still a lot of output.

[−] hgoel 31d ago

Using vibe coding for frequent PRs seems insanely reckless.

In my scientific computing environment, the majority of my vibe coded output goes to one-off scripts, stuff that is not worth committing (correcting outputs, one-off visualizations, consistency checks), and anything worth committing gets further refined to an extent that it pretty much can't be considered vibe coded anymore. It's simply too risky, any bugs would propagate down to decision making for designing new, expensive instruments.

I imagine that the cost and trust risks in enterprise environments are similar, so this seems very reckless.

AI Agents have helped up my productivity, but that's specifically because I can focus on the science, and delegate the auxiliary things to AI. I also believe I get this productivity out of them because my supervisor really drove home how hard I need to go on consistency checks and years of having my visualizations nitpicked (so I am able to do the same to AI and recognize when results are suspicious).

[−] iroddis 31d ago

The “programming is an act of externalizing a mental model” vs “a code review is reverse engineering the model, then verifying its reasoning” really hit home. Even before AI code reviews required a lot of mental effort for me. AI has made an already difficult process much more prevalent.

[−] solomatov 31d ago

Is there any publication which demonstrates that the improvement is really 10x?

[−] Incipient 31d ago

I'm a mostly solo dev, and I'm finding that being purely code-review for an AI is sub-optimal. Too often the AI runs off down bad paths which you only realise later, and unpicking the mess is most likely a productivity loss.

Working more as a pair, or essentially doing code review as you go, in small chunks, is significantly better.

I personally don't have the setup of tokens to spend to say "go build this entire thing" and then review 15k loc. I also find even opus is poor at coming up with tests to justify the business logic it's meant to be implementing.

[−] aetherspawn 31d ago

… how are you getting actual usable output at that scale? I have to baby my AI in 1 minute increments or it just doesn’t arrive at the correct solution at all.

Using Codex 5.2

[−] rvz 31d ago

> The industry calls this “10x productivity.” I call it what it is: a system that generates output at machine speed and forces humans to process it at biological speed.

The question is can you tolerate the amount of PRs thrown at you per day on top of reviewing the exponentially growing mess of code that continues to double every hour and being paid less for it.

Just learn to say no and leave. Why do you tolerate the increasing comprehension debt that is loaded on to you.

You will never get that time back. Just give it to someone else that thinks it is worth maintaining that slop for less.

[−] aanet 31d ago

I feel this is not discussed enough. I can attest to this 100%.

Just the past weekend, I was talking with a very senior engineer (~distinguished engineer at a very large tech co) who basically said he's working 8-8-6 (8 am - 8 pm, 6 days/week), "writing code" (more like supervising 8-15 agents) for a product demo in 2 weeks, which otherwise would have taken at least 1 quarter's worth of time with a small team. He's zonked out, fwiw. There are no junior engineers in the team ¯\_(ツ)_/¯, most having been laid off a few months ago.

The toll it takes, and the expectations of AI-driven productivity, have only increased dramatically. At some point, the reality will hit the remaining engg team. Not sure if the company or its leadership realizes, but so far, it's all-AI, all-the-time, human cost of productivity be damned.

[−] kakacik 31d ago

Somebody doesnt know how to regulate their pace, and then various burnout symptoms happen.

Not everybody pushes themselves like that, nor should, its anything but healthy and sustainable. In my experience it takes... rather obsessed people, ocd or similar traits, maybe 2 out of 10 intensity of their disease. Highly functional, smart, yet unbalanced.

Llms just allow this spiral to go further, while human limits remain the same. Each of us creates our own path, dont mess it up just because you can. Your employer doesnt care much about you at the end, just another cog in machine but health once damaged may not bounce back, ever

[−] robbrown451 31d ago

[dead]

[−] vomayank 31d ago

[flagged]

The human cost of 10x: How AI is physically breaking senior engineers (techtrenches.dev)

71 comments