Every single AI integration feels under-engineered (or not even engineered in case of tokenslop), as the creators put exactly the same amount of thought that $LLMOFTHEWEEK did into vomiting "You're absolutely right, $TOOL is a great solution for solving your issue!"
We're yet to genuinely standardise bloody help texts for basic commands (Does -h set the hostname, or does it print the help text? Or is it -H? Does --help exist?). Writing man-pages seems like a lost art at this point, everyone points to $WEBSITE/docs (which contains, as you guessed, LLM slopdocs).
We're gonna end up seeing the same loops of "Modern standard for AI" -> "Standard for AI" -> "Not even a standard" -> "Thing of the past" because all of it is fundamentally wrong to an extent. LLMs are purely textual in context, while network protocols are more intricate by pure nature. An LLM will always and always end up overspeccing a /api/v1/ping endpoint while ICMP ping can do that within bits. Text-based engineering, while visible (in the sense that a tech-illiterate person will find it easy to interpret), will always end up forming abstractions over core - you'll end up with a shaky pyramid that collapses the moment your $LLM model changes encodings.
We're yet to genuinely standardise bloody help texts for basic commands (Does -h set the hostname, or does it print the help text? Or is it -H? Does --help exist?). Writing man-pages seems like a lost art at this point, everyone points to $WEBSITE/docs (which contains, as you guessed, LLM slopdocs).
Are you describing under-engineered AI integrations, or just the general state of CLIs for decades now?
A lot of the best tooling around AI we're seeing is adding deterministic gates that the probabilistic AI agents work with. This is why I'm using MCP over http. I'm happy for the the agent to use it's intelligence and creativity to help me solve problems, but for a range of operations, I want a gate past which actions run with the certainty of normal software functions. NanoClaw sells itself on using deterministic filtering of your WhatsApp messages before the agent gets to see them, and proxies API keys so the agent bever gets them - this is a similar type of deterministic gate that allows for more confidence when working with AI.
I follow a similar pattern. My autonomous agent Smith has a service mesh that I plug MCPs into, which gives me a single place to define policy (OPA for life) and monitoring. The service gateway own credentials. This pattern is secure, easy to manage and lets you can programmatically generate a CLI from the tool catalog. https://github.com/sibyllinesoft/smith-gateway if you want to understand the model and how to implement it yourself.
The boundary also needs to hold if the agent is compromised. Proxying keys is the right instinct. We took the same approach at the action layer: cryptographic warrants scoped to the task, delegation-aware, verified at the MCP tool boundary before execution. Open source core. https://github.com/tenuo-ai/tenuo
It seems like we're going back to expert systems in a kind of inverted sense with all of this chaining of deterministic steps. But now the "experts" are specialized and well-defined actions available to something smart enough to compose them to create new, more powerful actions. We've moved the determinism to the right spot, maybe? Just a half-thought.
I'm just trying to learn this stuff now, so I don't the literature. The "trajectory view" through action space is what makes the most sense to me.
Along these lines, another half-baked pattern I see is kind of a time-lagged translation of stuff from modern stat mech to deep learning/"AI". First it was energy based systems and the complex energy landscape view, a-la spin glasses and boltzmann machines. The "equilibrium" state-space view, concerned with memory and pattern storage/retrieval. Hinton, amit, hopfield, mackay and co.
Now, the trajectory view that started in the 90s with jarzynski and crooks and really bloomed in 2010+ with "stochastic thermodynamics" seems to be a useful lens. The agent stuff is very "nonequilibrium"/ "active"-system coded, in the thermo sense... With the ability to create, modify, and exploit resources (tools/memory) on the fly, there's deep history and path dependence. I see ideas from recent wolpert and co.(Susanne still, crooks again, etc.) w.r.t. thermodynamics of computation providing a kind of through line, all trajectory based. That's all very vague I know, but I recently read the COALA paper and was very enchanted and have been trying to combine what I actually know with this new foreign agent stuff.
It's also very interesting to me how the Italian stat mech school, the parisi family, have continuously put out bangers trying to actually explain machine learning and deep learning success.
I'd love to hear if anyone is thinking along similar lines, or thinks I'm way off track, has paper recs please let me know! Especially papers on the trajectory view of agents.
I think we need to just think of agents as people. The same principles around how we authenticate, authorize and revoke permissions to people should apply to agents. We don't leave the server room door open for users to type commands into physical machines for good reason, and so we shouldn't be doing the same with agents, unless fully sandboxed or the blast radius of malign or erroneous action is fully accepted.
MCP is a fixed specification/protocol for AI app communication (built on top of an HTTP CRUD app). This is absolutely the right way to go for anything that wants to interoperate with an AI app.
For a long time now, SWEs seem to have bamboozled into thinkg the only way you can connect different applications together are "integrations" (tightly coupling your app into the bespoke API of another app). I'm very happy somebody finally remembered what protocols are for: reusable communications abstractions that are application-agnostic.
The point of MCP is to be a common communications language, in the same way HTTP is, FTP is, SMTP, IMAP, etc. This is absolutely necessary since you can (and will) use AI for a million different things, but AI has specific kinds of things it might want to communicate with specific considerations. If you haven't yet, read the spec: https://modelcontextprotocol.io/specification/2025-11-25
As soon as MCP came out I thought it was over engineered crud and didn’t invest any time in it. I have yet to regret this decision. Same thing with LangChain.
This is one key difference between experienced and inexperienced devs; if something looks like crud, it probably is crud. Don’t follow or do something because it’s popular at the time.
MCP is fine, particular remote MCP which is the lowest friction way to get access to some hosted service with auth handled for you.
However, MCP is context bloat and not very good compared to CLIs + skills mechanically. With a CLI you get the ability to filter/pipe (regular Unix bash) without having to expand the entire tool call every single time in context.
CLIs also let you use heredoc for complex inputs that are otherwise hard to escape.
CLIs can easily generate skills from the —help output, and add agent specific instructions on top. That means you can give the agent all the instructions it needs to know how to use the tools, what tools exist, lazy loaded, and without bloating the context window with all the tools upfront (yes, I know tool search in Claude partially solves this).
CLIs also don’t have to run persistent processes like MCP but can if needed
I’ve always felt like MCP is way better suited towards consumer usage rather than development environments. Like, yeah, MCP uses a lot of a context window, is more complex than it should be in structure, and it isn’t nearly as easy for models to call upon as a command line tool would be. But I believe that it’s also the most consumer friendly option available right now.
It’s much easier for users to find what exactly a model can do with your app over it compared to building a skill that would work with it since clients can display every tool available to the user. There’s also no need for the model to setup any environment since it’s essentially just writing out a function, which saves time since there’s no need to setup as many virtual machine instructions.
It obviously isn’t as useful in development environments where a higher level of risk can be accepted since changes can always be rolled back in the repository.
If I recall correctly, there’s even a whole system for MCP being built, so it can actually show responses in a GUI much like Siri and the Google Assistant can.
I still think MCP is completely unnecessary (and have from the start). The article correctly points out where CLI > MCP but stops short on 2 points:
1. Documenting the interface without MCP. This problem is best solved by the use of Skills which can contain instructions for both CLIs and APIs (or any other integration). Agents only load the relevant details when needed. This also makes it easy to customize the docs for the specific cases you are working with and build skills that use a subset of the tools.
2. Regarding all of the centralization benefits attributed to remote MCPs - you can get the same benefits with a traditional centralized proxy as well. MCP doesn't inherently grant you any of those benefits. If I use AWS sso via CLI, boom all of my permissions are tied to my account, benefit from central management, and have all the observability benefits.
In my mind, use Skills to document what to do and benefit from targeted progressive disclosure, and use CLIs and REST APIs for the actual interaction with services.
This came up in recent discussions about the Google apps CLI that was recently released. Google initially included an MCP server but then removed it silently - and some people believe this is because of how many different things the Google Workspace CLI exposes, which would flood the context. And it seemed like in social media, suddenly a lot of people were talking about how MCP is dead.
But fundamentally that doesn’t make sense. If an AI needs to be fed instructions or schemas (context) to understand how to use something via MCP, wouldn’t it need the same things via CLI? How could it not? This article points that out, to be clear. But what I’m calling out is how simple it is to determine for yourself that this isn’t an MCP versus CLI battle. However, most people seem to be falling for this narrative just because it’s the new hot thing to claim (“MCP is dead, Long Live CLI”).
As for Google - they previously said they are going to support MCP. And they’ve rolled out that support even recently (example from a quick search: https://cloud.google.com/blog/products/ai-machine-learning/a...). But now with the Google Workspace CLI and the existence of “Gemini CLI Extensions” (https://geminicli.com/extensions/about/), it seems like they may be trying to diminish MCP and push their own CLI-centric extension strategy. The fact that Gemini CLI Extensions can also reference MCP feels a lot like Microsoft’s Embrace, Extend, Extinguish play.
In v0, people can add e.g. Supabase, Neon, or Stripe to their projects with one click. We then auto-connect and auth to the integration’s remote MCP server on behalf of the user.
v0 can then use the tools the integration provider wants users to have, on behalf of the user, with no additional configuration. Query tables, run migrations, whatever. Zero maintenance burden on the team to manage the tools. And if users want to bring their own remote MCPs, that works via the same code path.
We also use various optimizations like a search_tools tool to avoid overfilling context
The problem with MCP isn't MCP. It's the way it's invoked by your agent.
IMO, by default MCP tools should run in forked context. Only a compacted version of the tool response should be returned to the main context. This costs tokens yes, but doesn't blow out your entire context.
If other information is required post-hoc, the full response can be explored on disk.
As someone charged with enabling users across an enterprise with AI tooling, the majority of whom are not in the software dev category, this article is perfectly mirroring my approach. Which is reassuring!
Challenges we are solving with centralised MCP are around brand guardianship, tone of voice, internal jargon and domain context, access to common data sources, and via the resources methods in MCP access to “skills” that prescribe patterns and shims for expected paths and ways of connecting/extracting data.
This seems misguided when you have to work in enterprise settings. MCP is a very natural fit for all the API auditing and domain borders that exist in enterprise environments, because it provides deterministic tooling and auditable interfaces for agents. Nobody wants an AI agent doing random API calls or shell commands.
I have moved towards super-specific scripts (so I guess "CLI"?) for a few reasons:
1. You can make the script very specific for the skill and permission appropriately.
2. You can have the output of the script make clear to the LLM what to do. Lint fails? "Lint rules have failed. This is an important for reasons blah blah and you should do X before proceeding". Otherwise the Agent is too focused on smashing out the overall task and might opt route around the error. Note you can use this for successful cases too.
3. The output and token usage can be very specific what the agent needs. Saves context. My github comments script really just gives the comments + the necessary metadata, not much else.
The downsides of MCP all focus on (3), but the 1+2 can be really important too.
the maintenance burden is the real MCP killer nobody talks about. your agent needs github? now you depend on some npm package wrapping an API that already had good docs. i just shell out to gh cli and curl - when the API changes, the agent reads updated docs and adapts. with MCP you wait on a middleman to update a wrapper.
tptacek nailed it - once agents run bash, MCP is overhead. the security argument is weird too, it shipped without auth and now claims security as chief benefit. chroot jails and scoped tokens solved this decades ago.
only place MCP wins is oauth flows for non-technical users who will never open a terminal. for dev tooling? just write better CLIs.
As yourself: what kind of tool I would love to have, to accomplish the work I'm asking the LLM agent to do? Often times, what is practical for humans to use, it is for LLMs. And the reply is almost never the kind of things MCP exports.
>The LLM has no way of knowing which CLI to use and how it should use it…unless each tool is listed with a description somewhere either in AGENTS|CLAUDE.md or a README.md
This is what the skill file is for.
>Centralizing this behind MCP allows each developer to authenticate via OAuth to the MCP server and sensitive API keys and secrets can be controlled behind the server
This doesn't require MCP. Nothing is stopping you from creating a service to proxy requests from a CLI.
The problem with this article is it doesn't recognize that skills is a more general superset compared with MCP. Anything done with MCP could have an equivalent done with a skill.
I find that skills work very well. The main SKILL file has an overview of all the capabilities of my platform at a high level and each section links to a more specific file which contains the full information with all possible parameters for that particular capability.
Then I have a troubleshooting file (also linked from the main SKILL file) which basically lists out all the 'gotchas' that are unique to my platform and thus the LLM may struggle with in complex scenarios.
After a lot of testing, I identified just 5 gotchas and wrote a short section for each one. The title of each section describes the issue and lists out possible causes with a brief explanation of the underlying mechanism and an example solution.
Adding the troubleshooting file was a game changer.
If it runs into a tricky issue, it checks that troubleshooting file. It's highly effective. It made the whole experience seamless and foolproof.
My platform was designed to reduce applications down to HTML tags which stream data to each other so the goal is low token count and no-debugging.
I basically replaced debugging with troubleshooting; the 5 cases I mentioned are literally all that was left. It seems to be able to quickly assemble any app without bugs now.
The 'gotchas' are not exactly bugs but more like "Why doesn't this value update in realtime?" kind of issues. They involve performance/scalability optimizations that the LLM needs to be aware of.
If it's a remote API, I suppose the argument is that you might as well fetch the documentation from the remote server, rather than using a skill that might go out of date. You're trusting the API provider anyway.
But it's putting a lot of trust in the remote server not to prompt-inject you, perhaps accidentally. Also, what if the remote docs don't suit local conditions? You could make local edits to a skill if needed.
Better to avoid depending on a remote API when a local tool will do.
I've been grading MCP server schemas for quality. 27 servers, 510 tools, 97K tokens measured.
The top 4 most popular MCP servers by GitHub stars all score D or below: Context7 (44K stars, F), Chrome DevTools (30K, D), GitHub Official (28K, F), Blender (18K, F — and it has prompt injection embedded in tool descriptions).
Meanwhile, PostgreSQL's MCP server — 1 tool, 46 tokens — scores a perfect 100. Popularity anti-correlates with quality.
205 comments
We're yet to genuinely standardise bloody help texts for basic commands (Does -h set the hostname, or does it print the help text? Or is it -H? Does --help exist?). Writing man-pages seems like a lost art at this point, everyone points to $WEBSITE/docs (which contains, as you guessed, LLM slopdocs).
We're gonna end up seeing the same loops of "Modern standard for AI" -> "Standard for AI" -> "Not even a standard" -> "Thing of the past" because all of it is fundamentally wrong to an extent. LLMs are purely textual in context, while network protocols are more intricate by pure nature. An LLM will always and always end up overspeccing a /api/v1/ping endpoint while ICMP ping can do that within bits. Text-based engineering, while visible (in the sense that a tech-illiterate person will find it easy to interpret), will always end up forming abstractions over core - you'll end up with a shaky pyramid that collapses the moment your $LLM model changes encodings.
Are you describing under-engineered AI integrations, or just the general state of CLIs for decades now?
I'm just trying to learn this stuff now, so I don't the literature. The "trajectory view" through action space is what makes the most sense to me.
Along these lines, another half-baked pattern I see is kind of a time-lagged translation of stuff from modern stat mech to deep learning/"AI". First it was energy based systems and the complex energy landscape view, a-la spin glasses and boltzmann machines. The "equilibrium" state-space view, concerned with memory and pattern storage/retrieval. Hinton, amit, hopfield, mackay and co.
Now, the trajectory view that started in the 90s with jarzynski and crooks and really bloomed in 2010+ with "stochastic thermodynamics" seems to be a useful lens. The agent stuff is very "nonequilibrium"/ "active"-system coded, in the thermo sense... With the ability to create, modify, and exploit resources (tools/memory) on the fly, there's deep history and path dependence. I see ideas from recent wolpert and co.(Susanne still, crooks again, etc.) w.r.t. thermodynamics of computation providing a kind of through line, all trajectory based. That's all very vague I know, but I recently read the COALA paper and was very enchanted and have been trying to combine what I actually know with this new foreign agent stuff.
It's also very interesting to me how the Italian stat mech school, the parisi family, have continuously put out bangers trying to actually explain machine learning and deep learning success.
I'd love to hear if anyone is thinking along similar lines, or thinks I'm way off track, has paper recs please let me know! Especially papers on the trajectory view of agents.
For a long time now, SWEs seem to have bamboozled into thinkg the only way you can connect different applications together are "integrations" (tightly coupling your app into the bespoke API of another app). I'm very happy somebody finally remembered what protocols are for: reusable communications abstractions that are application-agnostic.
The point of MCP is to be a common communications language, in the same way HTTP is, FTP is, SMTP, IMAP, etc. This is absolutely necessary since you can (and will) use AI for a million different things, but AI has specific kinds of things it might want to communicate with specific considerations. If you haven't yet, read the spec: https://modelcontextprotocol.io/specification/2025-11-25
This is one key difference between experienced and inexperienced devs; if something looks like crud, it probably is crud. Don’t follow or do something because it’s popular at the time.
However, MCP is context bloat and not very good compared to CLIs + skills mechanically. With a CLI you get the ability to filter/pipe (regular Unix bash) without having to expand the entire tool call every single time in context.
CLIs also let you use heredoc for complex inputs that are otherwise hard to escape.
CLIs can easily generate skills from the —help output, and add agent specific instructions on top. That means you can give the agent all the instructions it needs to know how to use the tools, what tools exist, lazy loaded, and without bloating the context window with all the tools upfront (yes, I know tool search in Claude partially solves this).
CLIs also don’t have to run persistent processes like MCP but can if needed
It’s much easier for users to find what exactly a model can do with your app over it compared to building a skill that would work with it since clients can display every tool available to the user. There’s also no need for the model to setup any environment since it’s essentially just writing out a function, which saves time since there’s no need to setup as many virtual machine instructions.
It obviously isn’t as useful in development environments where a higher level of risk can be accepted since changes can always be rolled back in the repository.
If I recall correctly, there’s even a whole system for MCP being built, so it can actually show responses in a GUI much like Siri and the Google Assistant can.
1. Documenting the interface without MCP. This problem is best solved by the use of Skills which can contain instructions for both CLIs and APIs (or any other integration). Agents only load the relevant details when needed. This also makes it easy to customize the docs for the specific cases you are working with and build skills that use a subset of the tools.
2. Regarding all of the centralization benefits attributed to remote MCPs - you can get the same benefits with a traditional centralized proxy as well. MCP doesn't inherently grant you any of those benefits. If I use AWS sso via CLI, boom all of my permissions are tied to my account, benefit from central management, and have all the observability benefits.
In my mind, use Skills to document what to do and benefit from targeted progressive disclosure, and use CLIs and REST APIs for the actual interaction with services.
But fundamentally that doesn’t make sense. If an AI needs to be fed instructions or schemas (context) to understand how to use something via MCP, wouldn’t it need the same things via CLI? How could it not? This article points that out, to be clear. But what I’m calling out is how simple it is to determine for yourself that this isn’t an MCP versus CLI battle. However, most people seem to be falling for this narrative just because it’s the new hot thing to claim (“MCP is dead, Long Live CLI”).
As for Google - they previously said they are going to support MCP. And they’ve rolled out that support even recently (example from a quick search: https://cloud.google.com/blog/products/ai-machine-learning/a...). But now with the Google Workspace CLI and the existence of “Gemini CLI Extensions” (https://geminicli.com/extensions/about/), it seems like they may be trying to diminish MCP and push their own CLI-centric extension strategy. The fact that Gemini CLI Extensions can also reference MCP feels a lot like Microsoft’s Embrace, Extend, Extinguish play.
In v0, people can add e.g. Supabase, Neon, or Stripe to their projects with one click. We then auto-connect and auth to the integration’s remote MCP server on behalf of the user.
v0 can then use the tools the integration provider wants users to have, on behalf of the user, with no additional configuration. Query tables, run migrations, whatever. Zero maintenance burden on the team to manage the tools. And if users want to bring their own remote MCPs, that works via the same code path.
We also use various optimizations like a search_tools tool to avoid overfilling context
IMO, by default MCP tools should run in forked context. Only a compacted version of the tool response should be returned to the main context. This costs tokens yes, but doesn't blow out your entire context.
If other information is required post-hoc, the full response can be explored on disk.
Challenges we are solving with centralised MCP are around brand guardianship, tone of voice, internal jargon and domain context, access to common data sources, and via the resources methods in MCP access to “skills” that prescribe patterns and shims for expected paths and ways of connecting/extracting data.
1. You can make the script very specific for the skill and permission appropriately.
2. You can have the output of the script make clear to the LLM what to do. Lint fails? "Lint rules have failed. This is an important for reasons blah blah and you should do X before proceeding". Otherwise the Agent is too focused on smashing out the overall task and might opt route around the error. Note you can use this for successful cases too.
3. The output and token usage can be very specific what the agent needs. Saves context. My github comments script really just gives the comments + the necessary metadata, not much else.
The downsides of MCP all focus on (3), but the 1+2 can be really important too.
tptacek nailed it - once agents run bash, MCP is overhead. the security argument is weird too, it shipped without auth and now claims security as chief benefit. chroot jails and scoped tokens solved this decades ago.
only place MCP wins is oauth flows for non-technical users who will never open a terminal. for dev tooling? just write better CLIs.
>The LLM has no way of knowing which CLI to use and how it should use it…unless each tool is listed with a description somewhere either in AGENTS|CLAUDE.md or a README.md
This is what the skill file is for.
>Centralizing this behind MCP allows each developer to authenticate via OAuth to the MCP server and sensitive API keys and secrets can be controlled behind the server
This doesn't require MCP. Nothing is stopping you from creating a service to proxy requests from a CLI.
The problem with this article is it doesn't recognize that skills is a more general superset compared with MCP. Anything done with MCP could have an equivalent done with a skill.
Then I have a troubleshooting file (also linked from the main SKILL file) which basically lists out all the 'gotchas' that are unique to my platform and thus the LLM may struggle with in complex scenarios.
After a lot of testing, I identified just 5 gotchas and wrote a short section for each one. The title of each section describes the issue and lists out possible causes with a brief explanation of the underlying mechanism and an example solution.
Adding the troubleshooting file was a game changer.
If it runs into a tricky issue, it checks that troubleshooting file. It's highly effective. It made the whole experience seamless and foolproof.
My platform was designed to reduce applications down to HTML tags which stream data to each other so the goal is low token count and no-debugging.
I basically replaced debugging with troubleshooting; the 5 cases I mentioned are literally all that was left. It seems to be able to quickly assemble any app without bugs now.
The 'gotchas' are not exactly bugs but more like "Why doesn't this value update in realtime?" kind of issues. They involve performance/scalability optimizations that the LLM needs to be aware of.
But it's putting a lot of trust in the remote server not to prompt-inject you, perhaps accidentally. Also, what if the remote docs don't suit local conditions? You could make local edits to a skill if needed.
Better to avoid depending on a remote API when a local tool will do.
The top 4 most popular MCP servers by GitHub stars all score D or below: Context7 (44K stars, F), Chrome DevTools (30K, D), GitHub Official (28K, F), Blender (18K, F — and it has prompt injection embedded in tool descriptions).
Meanwhile, PostgreSQL's MCP server — 1 tool, 46 tokens — scores a perfect 100. Popularity anti-correlates with quality.
Full leaderboard: https://0-co.github.io/company/leaderboard.html
You can grade your own server in browser: https://0-co.github.io/company/report.html?example=notion