Anthropic would be better off letting the community do this. Their harness sucks. Great scientists but not the best app developers. I suspect they just dont want to relinquish control of anything because they think the world cant be trusted with AI, we can only be trusted to pay them.
Custom agents using the low level completion APIs tend to outperform these generic tools, especially when you are working with complex problems.
It's hard to beat domain specific code. I can avoid massive prompts and token bloat if my execution environment, tools and error feedback provide effectively the same constraints.
If I had to pick only one tool for a generic agent to use, it would definitely be ExecuteSqlQuery (or a superset like ExecuteShell). If you gave me an agent framework and this is all it could do, I'd probably be ok for quite a while. SQL can absorb the domain specific concerns quite well. Consider that tool definitions also consume tokens.
Could you go into more details about why their "harness sucks?" This feels like a shared conclusion, but I've used several and theirs is better than many.
I am wondering if this is (or what else will be) the last piece of software + infra that's needed to "automate it all" and have non-technical people build, run, and maintain it? To me the all this agentic workflow automation is headed that way. Am I missing something?
As a newbie, at a high level if I set aside the hype aren’t agents basically bunch of python scripts communicating with remote LLMs with additional prompts and context saved in .md files with some “memory” or is there other magic pixie dust in there ?
It makes sense that anthropic is cranking out these products trying to find and maintain a foothold in the market.
But part of me just wishes they would go back to developing and refining an excellent and user-friendly harness.
I can't imagine what the long term support is for the dozens of products they release every three months.
Meanwhile, they're shipping a more and more buggy and Byzantine Claude code with a million switches and tons of ways to use it wrong.
The subscription play really does feel like a bait and switch lock-in: "we can focus less on the harness because people with subscriptions need to use it, and focus on growth."
24 comments
It's hard to beat domain specific code. I can avoid massive prompts and token bloat if my execution environment, tools and error feedback provide effectively the same constraints.
If I had to pick only one tool for a generic agent to use, it would definitely be ExecuteSqlQuery (or a superset like ExecuteShell). If you gave me an agent framework and this is all it could do, I'd probably be ok for quite a while. SQL can absorb the domain specific concerns quite well. Consider that tool definitions also consume tokens.
Anthropic made the most popular harness for developers.
Anthropic made the most popular desktop tool for AI automation.
But part of me just wishes they would go back to developing and refining an excellent and user-friendly harness.
I can't imagine what the long term support is for the dozens of products they release every three months.
Meanwhile, they're shipping a more and more buggy and Byzantine Claude code with a million switches and tons of ways to use it wrong.
The subscription play really does feel like a bait and switch lock-in: "we can focus less on the harness because people with subscriptions need to use it, and focus on growth."
Interested to see if this works out for them.