Building your AI agent "toolkit" is becoming the equivalent of the perfect "productivity" setup where you spend your time reading blog posts, watching YouTube videos telling you how to be productive and creating habits and rituals...only to be overtaken by a person with a simple paper list of tasks that they work through.
Plain Claude, ask it to write a plan, review plan, then tell it to execute still works the best in my experience.
Lots of money being made by luring people into this trap.
The reality is that if you actually know what you want, and can communicate it well (where the productivity app can be helpful), then you can do a lot with AI.
My experience is that most people don't actually know what they want. Or they don't understand what goes into what they want. Asking for a plan is a shortcut to gaining that understanding.
Problem is they don’t know how to express themselves and many people, especially those interested in tech, don’t want to learn.
I can’t tell you how many times I have a CS student in my office for advising and they tell me they only want to take technical courses, because anything reading or writing or psychology or history based is “soft”, unrelated to their major, and a waste of their time.
I’ve spent years telling them critical reading and expressive writing skills are very important to being a functioning adult, but they insist what they need to know can only be found in the Engineering college.
Much of my time at work is reading through quickly typed messages from my boss and understanding exactly what questions I need to ask in order to make it easy for him to answer clearly.
Engineers who lack soft skills cannot be effective in team environments.
> My experience is that most people don't actually know what they want. Or they don't understand what goes into what they want.
1000%. This is why people whose job it was to figure out how to make a thing are thriving with AI tools, and those who operate in the conceptual/abstract are flailing and frustrated with it. (it really is a mirror in this case, and the frustration they have is unknowingly directed at themselves)
Or, as I like to put it: I need to activate my personal transformers on my inner embeddings space to figure what is it I really want. And still, quite often, I think in terms of the programming language I'm used to and the library I'm familiar with.
So, to really create something new that I care about, LLMs don't help much.
its not though if you're working in a massive codebase or on a distributed system that has many interconnected parts.
skills that teach the agent how to pipe data, build requests, trace them through a system and datasources, then update code based on those results are a step function improvement in development.
ai has fundamentally changed how productive i am working on a 10m line codebase, and i'd guess less than 5% of that is due to code gen thats intended to go to prod. Nearly all of it is the ability to rapidly build tools and toolchains to test and verify what i'm doing.
Let me give you a counterexample. I'm working on a product for the national market, and i need to do all financial tasks, invoicing, submit to national fiscal databse etc. through a local accounting firm. So i integrate their API in the backend; this is a 100% custom API developed by this small european firm, with a few dozen restful enpoints supporting various accounting operations, and I need to use it programmatically to maintain sync for legal compliance. No LLM ever heard of it. It has a few hundred KB of HTML documentation that Claude can ingest perfectly fine and generate a curl command for, but i don't want to blow my token use and context on every interaction.
So I naturally felt the need to (tell Claude to) build a MCP for this accounting API, and now I ask it to do accounting tasks, and then it just does them. It's really ducking sweet.
Another thing I did was, after a particularly grueling accounting month close out, I've told Claude to extract the general tasks that we accomplished, and build a skill that does it at the end of the month, and now it's like having a junior accountant in at my disposal - it just DOES the things a professional would charge me thousands for.
So both custom project MCPs and skills are super useful in my experience.
All I want is for my agent to save me time, and to become a _compounding_ multiplier for my output. As a PM, I mostly want to use it for demos and prototypes and ideation. And I need it to work with my fractured attention span and saturated meeting schedule, so compounding is critical.
I’m still new to this, but the first obvious inefficiency I see is that I’m repeating context between sessions, copying .md files around, and generally not gaining any efficiency between each interaction. My only priority right now is to eliminate this repetition so I can free up buffer space for the next repetition to be eliminated. And I don’t want to put any effort into this.
How are you guys organizing this sort of compounding context bank? I’m talking about basic information like “this is my job, these are the products I own, here’s the most recent docs about them, here’s how you use them, etc.” I would love to point it to a few public docs sites and be done, but that’s not the reality of PM work on relatively new/instable products. I’ve got all sorts of docs, some duplicated, some outdated, some seemingly important but actually totally wrong… I can’t just point the agent at my whole Drive and ask it to understand me.
Should I tell my agent to create or update a Skill file every time I find myself repeating the same context more than twice? Should I put the effort into gathering all the best quality docs into a single Drive folder and point it there? Should I make some hooks to update these files when new context appears?
This resonates with me. Sometimes I build up some artifacts within the context of a task, but these almost always get thrown away. There are primarily three reason I prefer a vanilla setup.
1. I have many and sometimes contradictory workflows: exploration, prototyping, bug fixing debugging, feature work, pr management, etc. When I'm prototyping, I want reward hacking, I don't care about tests or lint's, and it's the exact opposite when I manage prs.
2. I see hard to explain and quantify problems with over configuration. The quality goes down, it loses track faster, it gets caught in loops. This is totally anecdotal, but I've seen it across a number of projects. My hypothesis is that is related to attention, specifically since these get added to the system prompt, they pull the distribution by constantly being attended to.
3. The models keep getting better. Similar to 2, sometime model gains are canceled out by previously necessary instructions. I hear the anthropic folks clear their claude.md every 30 days or so to alleviate this.
Agree. For what it’s worth, in interviews Cherny (Claude Code creator) and Steinberger (OpenClaw creator) say they keep things simple and use none of the workflow frameworks. The latter even said he doesn’t even use plan mode, but I find that very useful: exiting plan mode starts clean with compressed context.
> Plain Claude, ask it to write a plan, review plan, then tell it to execute still works the best in my experience.
Working on an unspecified codebase of unknown size using unconfigured tooling with unstated goals found that less configuration worked better than more.
at work i've spent some time setting up our claude.md files and curated the .claude directory with relevant tools such as linear, figma, sentry, LSP, browser testing. sensible stuff anyone using these tools would want, it all works pretty well.
my only machine-specific config is overriding haiku usage with sonnet in claude code. i outline what i want in linear, have claude synthesize into a plan and we iterate until we're both happy, then i let it rip. works great.
then one of my juniors goes and loads up things like "superpowers" and all sorts of stuff that's started littering his PRs. i'm just not convinced this ricing of agents materially improves anything.
Understandable - I find skills for odd duck things and a simple set of rules you routinely prune work the best for me. Went from crappy code in niche projects to it nailing things first prompt almost every time now.
I consider it more like people installing oh-my-zsh or whatever that brings a TON of features they'll never use, just because some cool tech influencer said it's cool to use it.
The proper way to do this is find a personal pain point, figure out how to fix it, fix it, and then continue.
That's how I built my own system, zero skills, just a git submodule with shared guides how to do stuff the way _I_ like it. I can just refer any agent to read that directory and they'll usually get it on the first go.
This. At work I have described this phenomenon as the equivalent of tinkering with the margins and fonts in your word processor instead of just writing your paper.
This is what I do; frankly I can't be arsed to take the time to write all these commands and skills and whatnot. I did use /init to get Claude to create a CLAUDE.md file, and I occasionally -- very occasionally -- go through it and correct anything that's no longer valid due to code changes (and then ask Claude to do the same).
But beyond that, I just ask it what I want it to ask, and that's it. I'm not convinced that putting more time into building the "toolbox" will actually give me significant returns on that time.
I do think that some of this (commands, skills, breaking up CLAUDE.md into separate rules files) can be useful, but it's highly context-dependent, and I think YAGNI applies here: don't front-load this work. Only set those up if you run into specific problems or situations where you think doing this work will make Claude work better.
I've had the same thought recently and this definitely is a thing that you can do - but there are also cases where you get dramatically better results if you put some more effort into your setup.
e.g. spend time creating a skill about how to query production logs
if you work on platforms, frameworks, tools that are public knowledge, then yeah. If there’s nothing unique to your project or how to write code in it, build it, deploy it, operate it, yeah.
But for some projects there will be things Claude doesn’t know about, or things that you repeatedly want done a specific way and don’t want to type it in every prompt.
I’m seeing this more and more, where people build this artificial wall you supposedly need to climb to try agentic coding. That’s not the right way to start at all. You should start with a fresh .claude, empty AGENTS.md, zero skills and MCP and learn to operate the thing first.
Feel little like this is generated and not based on experience. Claude.md should be short. Typescript strict mode isnt a gotcha, itll figure that out on its own easily, imo omit things like that. People put far too much stuff in claude, just a few lines and links to docs is all it needs. You can also @Agents.md and put everything there instead. Dont skills supercede commands? Subagents are good esp if you specify model, forked memory, linked skills, etc. Always ask what you can optimize after you see claude thrashing, then figure out how to encode that (or refactor your scripts or code choices).
Always separate plan from implementation and clear context between, its the build up of context that makes it bad ime.
I wish all model providers would converge on a standard set of files, so I could switch easily from Claude to Codex to Cursor to Opencode depending on the situation
I keep seeing these posts, and here's the most interesting thing, for me.
I get the best results with the least number of skills and unnecessary configuration in place.
People are spending way too much time over-prescribing these documents, but AI is like a competent but nervous adult. The more you give it, the dumber it gets.
Claude Fast has very good alternate documentation for this. [0] I don't understand the hate for defining .claude/ . It is quite easy to have the main agent write the files. Then rather doing one shot coding, instead iterate quickly updating .claude/ I'm at the point where .claude/ makes copies of itself, performs the task, evaluates, and updates itself. I'm not coding code, I'm coding .claude/ which does everything else. This is also a mechanism for testing .claude, agents, and instructions which would be useful for sharing and reuse in an organization.
The real wall I never see people talking about is, yes, you can tell Claude to update whatever file you want, but you have to be aware that if it's .claude/INSTRUCTIONS.md or CLAUDE.md that you need to tell Claude to re-read those files because it wrote the contents but its not treating it as if it were fresh instructions, it will run off whatever the last time it read that file was, so if it never existed, it will not know. I believe Claude puts those instructions in a very specific part of its context window.
The claim that "whatever you write in CLAUDE.md, Claude will follow" is doing a lot of heavy lifting. In practice CLAUDE.md is a suggestion, not a contract. Complex tasks and compaction will dilute the use of CLAUDE.md, especially once the context window runs out.
Nice! Article didn't mention but ~/.claude/plans is where it stores plan md file when running in plan mode. I find it useful to open or backup plans from the directory.
I've been going heavily in the direction of globally configured MCP servers and composite agents with copilot, and just making my own MCP servers in most cases.
Then all I have to do is let the agents actually figure out how to accomplish what I ask of them, with the highly scoped set of tools and sub agents I give them.
I find this works phenomenally, because all the .agent.md file is, is a description of what the tools available are. Nothing more complex, no LARP instructions. Just a straightforward 'here's what you've got'.
And with agents able to delegate to sub agents, the workflow is self-directing.
Working with a specific build system? Vibe code an MCP server for it.
Making a tool of my own? MCP server for dev testing and later use by agents.
On the flipside, I find it very questionable what value skills and reusable prompts give. I would compare it to an architect playing a recording of themselves from weeks ago when talking to their developers. The models encode a lot of knowledge, they just need orientation, not badgering, at this point.
The .claude folder structure reminds me of how Terraform organizes state files. Smart move putting conversation history in Json rether than some propiertary format, makes it trivial to grep through old conversations or build custom analysis tools.
Nice writeup! Been experimenting a lot with skills, agents and tools like GSD and honestly the biggest lesson for me has been: keep things short and simple.
Not just CLAUDE.md but everything, the prompts you type in a session, skill descriptions, agent configs, all of it. More instructions does not mean better results. Claude actually performs noticeably better with brief focused input. The moment you start over-specifying things quality drops.
Just my observation from using it daily but it's been consistent enough that I figured it's worth sharing.
So that's what "software engineering" has become nowadays ? Some cargo cult basically. Seriously all of this gives red flag. No statements here are provable. It's just like langhchain that was praised and then everyone realized it's absolute dog water. Just like MCP too. The job in 2026 is really sad.
>Claude Code users typically treat the .claude folder like a black box. They know it exists. They’ve seen it appear in their project root. But they’ve never opened it, let alone understood what every file inside it does.
I know we are living in a post-engineering world now, but you can't tell me that people don't look at PRs anymore, or their own diffs, at least until/if they decide to .gitignore .claude.
Is there a completely free coding assistant agent that doesn't require you to give a credit card to use it?
I recently tried IntelliJ for Kotlin development and it wanted me to give a credit card for a 30 day trial. I just want something that scans my repo and I tell it the changes I want and it does it. If possible, it would also run the existing tests to make sure its changes don't break anything.
I think this does a great job of explaining the .claude directories in a beginner friendly way. And I don’t necessarily read it as “you have to do all this, before you start”.
It has a few issues with outdated advice (e.g. commands has been merged with skills), but overall I might use share it with co-workers who needs an introduction to the concept.
Completely tangential, but can we please stop putting one million files at the root of the project which have nothing to do with the project? Can we land on a convention like, idk, a .meta folder (not the meta company, the actual word), or whatever, in which all of these Claude.md, .swift-version, Code-of-Conduct.md, Codeowners, Contributing.md, .rubocop.yml, .editorconfig, etc. files would go??
So when Anthropic releases a new model that "breaks compatibility" with some Markdown files, do we call it "refactoring" to find (guess) the required changes to have the desired outcome again? Don't we create brittle specifications to fit a version of a model?
It's shocking how shitty claude code CLI app is - config is brittle shit (setting up a plugin LSP is searching through GitHub issues and guessing which parameters you messed up), hooks render errors in the app when there are none and the permission harness is barely documented, zero customization options (would you like the agent config come from a different folder than source root ? nope). Going through gihub issues, same issue you hit has been open since beginning of 2025 and ignored - their issues are /dev/null - it's basically a user forum.
Here's a question that I hope is not too off-topic.
Do people find the nano-banana cartoon infographics to be helpful, or distracting? Personally, I'm starting to tire seeing all the little cartoon people and the faux-hand-drawn images.
Off topic but earlier today I asked Gemini to read this article and advise how to do the same things for OpenCode. I am fascinated with trying to get good performance from small local models.
If these different agents could agree on a standard location that would be great. The specs are almost the same for .github and Claude but Claude won't even look at the .github location.
265 comments
Plain Claude, ask it to write a plan, review plan, then tell it to execute still works the best in my experience.
The reality is that if you actually know what you want, and can communicate it well (where the productivity app can be helpful), then you can do a lot with AI.
My experience is that most people don't actually know what they want. Or they don't understand what goes into what they want. Asking for a plan is a shortcut to gaining that understanding.
I can’t tell you how many times I have a CS student in my office for advising and they tell me they only want to take technical courses, because anything reading or writing or psychology or history based is “soft”, unrelated to their major, and a waste of their time.
I’ve spent years telling them critical reading and expressive writing skills are very important to being a functioning adult, but they insist what they need to know can only be found in the Engineering college.
Engineers who lack soft skills cannot be effective in team environments.
> My experience is that most people don't actually know what they want. Or they don't understand what goes into what they want.
1000%. This is why people whose job it was to figure out how to make a thing are thriving with AI tools, and those who operate in the conceptual/abstract are flailing and frustrated with it. (it really is a mirror in this case, and the frustration they have is unknowingly directed at themselves)
So, to really create something new that I care about, LLMs don't help much.
They are still useful for plenty of other tasks.
We used to have the very difficult task of producing working scalable maintainable code describing complex systems which do what we need them to do.
Now on top of it we have the difficult task of producing this code using constantly mutating complex nondeterministic systems.
We are the circus bear riding a bicycle on a high wire now being asked to also spin plates and juggle chainsaws.
Maybe singularity means that time sunk into managing LLMs is equal to time needed to manually code similar output in assembly or punch cards.
skills that teach the agent how to pipe data, build requests, trace them through a system and datasources, then update code based on those results are a step function improvement in development.
ai has fundamentally changed how productive i am working on a 10m line codebase, and i'd guess less than 5% of that is due to code gen thats intended to go to prod. Nearly all of it is the ability to rapidly build tools and toolchains to test and verify what i'm doing.
So I naturally felt the need to (tell Claude to) build a MCP for this accounting API, and now I ask it to do accounting tasks, and then it just does them. It's really ducking sweet.
Another thing I did was, after a particularly grueling accounting month close out, I've told Claude to extract the general tasks that we accomplished, and build a skill that does it at the end of the month, and now it's like having a junior accountant in at my disposal - it just DOES the things a professional would charge me thousands for.
So both custom project MCPs and skills are super useful in my experience.
I’m still new to this, but the first obvious inefficiency I see is that I’m repeating context between sessions, copying .md files around, and generally not gaining any efficiency between each interaction. My only priority right now is to eliminate this repetition so I can free up buffer space for the next repetition to be eliminated. And I don’t want to put any effort into this.
How are you guys organizing this sort of compounding context bank? I’m talking about basic information like “this is my job, these are the products I own, here’s the most recent docs about them, here’s how you use them, etc.” I would love to point it to a few public docs sites and be done, but that’s not the reality of PM work on relatively new/instable products. I’ve got all sorts of docs, some duplicated, some outdated, some seemingly important but actually totally wrong… I can’t just point the agent at my whole Drive and ask it to understand me.
Should I tell my agent to create or update a Skill file every time I find myself repeating the same context more than twice? Should I put the effort into gathering all the best quality docs into a single Drive folder and point it there? Should I make some hooks to update these files when new context appears?
1. I have many and sometimes contradictory workflows: exploration, prototyping, bug fixing debugging, feature work, pr management, etc. When I'm prototyping, I want reward hacking, I don't care about tests or lint's, and it's the exact opposite when I manage prs.
2. I see hard to explain and quantify problems with over configuration. The quality goes down, it loses track faster, it gets caught in loops. This is totally anecdotal, but I've seen it across a number of projects. My hypothesis is that is related to attention, specifically since these get added to the system prompt, they pull the distribution by constantly being attended to.
3. The models keep getting better. Similar to 2, sometime model gains are canceled out by previously necessary instructions. I hear the anthropic folks clear their claude.md every 30 days or so to alleviate this.
> Plain Claude, ask it to write a plan, review plan, then tell it to execute still works the best in my experience.
Working on an unspecified codebase of unknown size using unconfigured tooling with unstated goals found that less configuration worked better than more.
* Claude trying to install packages into my Python system interpreter - (always use uv and venvs)
* Claude pushing to main - (don't push to main ever)
* When creating a PR, completely ignoring how to contribute (always read CONTRIBUTING.md when creating a PR)
* Yellow ANSI text in console output - (Color choices must be visible on both dark and light backgrounds)
Because I got sick of repeating myself about the basics.
my only machine-specific config is overriding haiku usage with sonnet in claude code. i outline what i want in linear, have claude synthesize into a plan and we iterate until we're both happy, then i let it rip. works great.
then one of my juniors goes and loads up things like "superpowers" and all sorts of stuff that's started littering his PRs. i'm just not convinced this ricing of agents materially improves anything.
The proper way to do this is find a personal pain point, figure out how to fix it, fix it, and then continue.
That's how I built my own system, zero skills, just a git submodule with shared guides how to do stuff the way _I_ like it. I can just refer any agent to read that directory and they'll usually get it on the first go.
All the fancy frameworks are vibe coded, so why could they do better than something you do by yourself?
At most get playwright MCP in so the agent can see the rendered output
But beyond that, I just ask it what I want it to ask, and that's it. I'm not convinced that putting more time into building the "toolbox" will actually give me significant returns on that time.
I do think that some of this (commands, skills, breaking up CLAUDE.md into separate rules files) can be useful, but it's highly context-dependent, and I think YAGNI applies here: don't front-load this work. Only set those up if you run into specific problems or situations where you think doing this work will make Claude work better.
e.g. spend time creating a skill about how to query production logs
But for some projects there will be things Claude doesn’t know about, or things that you repeatedly want done a specific way and don’t want to type it in every prompt.
Always separate plan from implementation and clear context between, its the build up of context that makes it bad ime.
I get the best results with the least number of skills and unnecessary configuration in place.
People are spending way too much time over-prescribing these documents, but AI is like a competent but nervous adult. The more you give it, the dumber it gets.
[0] https://claudefa.st/blog/guide/mechanics/claude-md-mastery
Then all I have to do is let the agents actually figure out how to accomplish what I ask of them, with the highly scoped set of tools and sub agents I give them.
I find this works phenomenally, because all the .agent.md file is, is a description of what the tools available are. Nothing more complex, no LARP instructions. Just a straightforward 'here's what you've got'.
And with agents able to delegate to sub agents, the workflow is self-directing.
Working with a specific build system? Vibe code an MCP server for it.
Making a tool of my own? MCP server for dev testing and later use by agents.
On the flipside, I find it very questionable what value skills and reusable prompts give. I would compare it to an architect playing a recording of themselves from weeks ago when talking to their developers. The models encode a lot of knowledge, they just need orientation, not badgering, at this point.
Not just CLAUDE.md but everything, the prompts you type in a session, skill descriptions, agent configs, all of it. More instructions does not mean better results. Claude actually performs noticeably better with brief focused input. The moment you start over-specifying things quality drops.
Just my observation from using it daily but it's been consistent enough that I figured it's worth sharing.
>Claude Code users typically treat the .claude folder like a black box. They know it exists. They’ve seen it appear in their project root. But they’ve never opened it, let alone understood what every file inside it does.
I know we are living in a post-engineering world now, but you can't tell me that people don't look at PRs anymore, or their own diffs, at least until/if they decide to .gitignore .claude.
I recently tried IntelliJ for Kotlin development and it wanted me to give a credit card for a 30 day trial. I just want something that scans my repo and I tell it the changes I want and it does it. If possible, it would also run the existing tests to make sure its changes don't break anything.
It has a few issues with outdated advice (e.g. commands has been merged with skills), but overall I might use share it with co-workers who needs an introduction to the concept.
.metafolder (not the meta company, the actual word), or whatever, in which all of these Claude.md, .swift-version, Code-of-Conduct.md, Codeowners, Contributing.md, .rubocop.yml, .editorconfig, etc. files would go??Just read the official Claude documentation:
https://code.claude.com/docs/
[1]: https://www.npmjs.com/package/claude-code-types
Do people find the nano-banana cartoon infographics to be helpful, or distracting? Personally, I'm starting to tire seeing all the little cartoon people and the faux-hand-drawn images.
Wouldn't Tufte call this chartjunk?
When you have this performative folder of skills the AI wastes a bunch of tool calls, gets confused, doesn't get to the meat of the problem.
beware!