Apart from rediscovering all the problems with distributed systems, I think LM teams will also rediscover their own version of the mythical man-month, and very quickly too.
There were 3 core insights: adding people makes the project later, communication cost grows as n^2, and time isn't fungible.
For agents, maybe the core insight won't hold, and adding a new agent won't necessarily increase dev-time, but the second will be worse, communication cost will grow faster than n^2 because of LLM drift and orchestration overhead.
The third doesn't translate cleanly but i'll try: Time isn't fungible for us and assumptions and context, however fragmented, aren't fungible for agents in a team. If they hallucinate at the wrong time, even a little, it could be a equivalent of a human developer doing a side-project during company time.
An agent should write an article on it and post it on moltbook: "The Inevitable Agent Drift"
I've long thought of the analogy as useful for human teams, and it even shows up in the corporate jargon (eg. "I'm blocked", "We need to align", etc). It's surprisingly common for whole branches of a large org to be doing net negative work due to conflicting goals or not realizing some implication that cuts across teams and local contexts. Sometimes these issues are technical but just as often they are pure product or business decisions with no explicit dependency until a lightbulb goes off somewhere.
With hand-written code, things generally move slow enough, and there's enough common sense sprinkled across the org chart that things can get uncovered organically. With agent teams, speed increased by several orders of magnitude and common sense is out the window, so I suspect the ceiling on productive use of agents will be far more limited in number than traditional engineering teams, and it will heavily depend on who and how humans are plugged into the right places.
One thing I suspect professional researchers underestimate is how much positive output can be produced by a human team with vague or hand-wavy direction, and surprisingly little deep thinking, let alone a robust specification or structure to keep them on track. The reality is any large team regresses to the mean, and it's usually a few savvy people that actually drive outcomes. These people don't necessarily have official authority, just a nose for the right thing. This won't spontaneously emerge from agents (at least until they become a lot more human like in terms of big picture common sense, and dial down the sycophancy to a more "skeptical engineer" level).
We’ve been building exactly this as an open-source ecosystem at consensus-tools. It’s a governance layer for multi-agent systems with a runtime wrapper that intercepts agent decisions before they execute: .consensus(fn, opts).
The coordination and consistency problems the paper describes are what the monorepo is designed around. Giving agents auditable stake in decisions. Happy to share more if anyone’s working in this space.
I’ve found an interesting model to think about to be production crews similar to in the television world but potentially be something worth using as one’s mental model of how agents and people working alongside agents should coordinate, rather than basing the simulated team force off the typical office worker framework
This is how we design at HewesNguyen AI. We are both MIS so once LLMs came out we where like sweet whole teams that can be tasked for one thing done well. Thank you Unix Philosophy
I find depth to be far more interesting than breadth with these models.
Descending into a problem space recursively won't necessarily find the best solution, but it's going to tend to find some solution faster than going wide across a swarm of agents. Theoretically it's exponentially faster to have one symbolically recursive agent than to have any number of parallel agents.
I think agent swarm stuff sucks for complex multi-step problems because it's mostly a form of BFS. It never actually gets to a good solution because it's searching too wide and no one can afford to wait for it to strip mine down to something valuable.
Everyone wants to be the CEO of their own megacorp managing thousands of AI engineers I guess. Just like microservices, there’s probably a ton of overhead doing things this way vs monolithic / single agent. Certain types of engineers just love over-engineering hugely complex stuff to see it work. Goldberg architecture was already prevalent and bad enough in enterprise before the AI boom.
Once you run more than one agent in a loop, you inevitably recreate distributed systems problems: message ordering, retries, partial failure, etc.
Most agent frameworks pretend these don’t exist. Some of them address those problems partially. None of the frameworks I've seen address all of them.
Struggling to find anything interesting or non-obvious about this article. You give a bunch of LLMs various parallelizable task and some models manage to do it well but others don't. No insights as to why. As someone with a distributed systems background the supposed 'insights' from distributed computing are almost trivial.
The current fad for "agent swarms" or "model teams" seems misguided, although it definitely makes for great paper fodder (especially if you combine it with distributed systems!) and gets the VCs hot.
An LLM running one query at a time can already generate a huge amount of text in a few hours, and drain your bank account too.
A "different agent" is just different context supplied in the query to the LLM. There is nothing more than that. Maybe some of them use a different model, but again, this is just a setting in OpenRouter or whatever.
Agent parallelism just doesn't seem necessary and makes everything harder. Not an expert though, tell me where I'm wrong.
46 comments
There were 3 core insights: adding people makes the project later, communication cost grows as n^2, and time isn't fungible.
For agents, maybe the core insight won't hold, and adding a new agent won't necessarily increase dev-time, but the second will be worse, communication cost will grow faster than n^2 because of LLM drift and orchestration overhead.
The third doesn't translate cleanly but i'll try: Time isn't fungible for us and assumptions and context, however fragmented, aren't fungible for agents in a team. If they hallucinate at the wrong time, even a little, it could be a equivalent of a human developer doing a side-project during company time.
An agent should write an article on it and post it on moltbook: "The Inevitable Agent Drift"
With hand-written code, things generally move slow enough, and there's enough common sense sprinkled across the org chart that things can get uncovered organically. With agent teams, speed increased by several orders of magnitude and common sense is out the window, so I suspect the ceiling on productive use of agents will be far more limited in number than traditional engineering teams, and it will heavily depend on who and how humans are plugged into the right places.
One thing I suspect professional researchers underestimate is how much positive output can be produced by a human team with vague or hand-wavy direction, and surprisingly little deep thinking, let alone a robust specification or structure to keep them on track. The reality is any large team regresses to the mean, and it's usually a few savvy people that actually drive outcomes. These people don't necessarily have official authority, just a nose for the right thing. This won't spontaneously emerge from agents (at least until they become a lot more human like in terms of big picture common sense, and dial down the sycophancy to a more "skeptical engineer" level).
The coordination and consistency problems the paper describes are what the monorepo is designed around. Giving agents auditable stake in decisions. Happy to share more if anyone’s working in this space.
Descending into a problem space recursively won't necessarily find the best solution, but it's going to tend to find some solution faster than going wide across a swarm of agents. Theoretically it's exponentially faster to have one symbolically recursive agent than to have any number of parallel agents.
I think agent swarm stuff sucks for complex multi-step problems because it's mostly a form of BFS. It never actually gets to a good solution because it's searching too wide and no one can afford to wait for it to strip mine down to something valuable.
An LLM running one query at a time can already generate a huge amount of text in a few hours, and drain your bank account too.
A "different agent" is just different context supplied in the query to the LLM. There is nothing more than that. Maybe some of them use a different model, but again, this is just a setting in OpenRouter or whatever.
Agent parallelism just doesn't seem necessary and makes everything harder. Not an expert though, tell me where I'm wrong.