Author here. I built this in a few hours after the Claude Code leak.
I've been working on my own coding agent setup for a while. I mostly use pi [0] because it's minimal and easy to extend. When the leak happened, I wanted to study how Anthropic structured things: the tool system, how the agent loop flows, A 500K line codebase is a lot to navigate, so I mapped it visually to give myself a quick reference I could come back to while adapting ideas into my own harness and workflow.
I'm actively updating the site based on feedback from this thread. If anything looks off, or you find something I missed, lmk.
Of course I expect it is vibe coding. It would be insane to code anything by hand these days. But that doesn't mean there is no creative input by the author here.
There's a lot of errors you can miss by coding by hand, even as a seasoned developer. Try taking Claude Code, point it at your repo, and ask it to find bugs. I bet it will.
Claude is actually a crazy good vuln researcher. If you use it that way, your code might just be more secure than written purely by hand.
Depends on what you’re building and whether it’s recreational or not. Complex architecture vs a ui analysis tool, for example. For a ui analysis tool, the only reason you code by hand is for the joy of coding by hand. Even though you can drive a car or fly in a plane there are times to walk or ride a bike still.
I genuinely believe this. Even if you're inventing a new algorithm it is better to describe the algorithm in English and have AI do the implementation.
One reason, beside basic altruism, is so you can put the projects on your resume. This is especially helpful if the project does very well or gets lots of stars.
This is nice, I really like the style/tone/cadence.
The only suggestion/nit I have is that you could add some kind of asterisk or hover helper to the part when you talk about 'Anthropic's message format', as it did make me want to come here and point out how it's ackchually OpenAI's format and is very common.
Only because I figure if this was my first time learning about all this stuff I think I'd appreciate a deep dive into the format or the v1 api as one of the optional next steps.
I’m using pi and cc locally in a docker container connected to a local llama.cpp so the whole agentic loop is 100% offline.
I had used pi and cc to analyze the unpacked cc to compare their design, architecture and implementation.
I guess your site was also coded with pi and it is very impressive. Wonderful if you can do a visualization for pi vs cc as well. My local models might not be powerful enough.
I thought that early coding assistants came to be written in some Java/TypeScript, because AI companies just had web-devs playing around and then made it a product even though the languages being such a misfit for terminal. Why did you decide for TypeScript?
I know it seems counter-intuitive but are there any agent harnesses that aren’t written with AI? All these half a million LoC codebases seem insane to me when I run my business on a full-stack web application that’s like 50k lines of code and my MvP was like 10k. These are just TUIs that call a model endpoint with some shell-out commands. These things have only been around in time measured in months, half a million LoC is crazy to me.
Who cares about LoC? Its a metric that hasn't mattered since we measured productivity in it in the 1980s. For all we know they made these design choices so they could more easily reuse the code in other codebases. Ideally you'd build the library to do that at the same time, but this is start up time constraints to repay loans and shit.
"
Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".
For each frame our pipeline constructs a scene graph with React then
-> layouts elements
-> rasterizes them to a 2d screen
-> diffs that against the previous screen
-> finally uses the diff to generate ANSI sequences to draw
We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written.
"
60fps is pathetic for a TUI when most terminals worth their salt are GPU accelerated and displays can be up to 240fps or even more. But let’s be real if I can play Quake at >500 fps they have no excuse.
> These are just TUIs that call a model endpoint with some shell-out commands.
Claude Code CLI is actually horrible: it's a full headless browser rendering that's then converted in real-time to text to show in the terminal. And that fact leaks to the user: when the model outputs ASCII, the converter shall happily convert it to Unicode (no latter than yesterday there was a TFA complaining about Unicode characters breaking Unix pipes / parsers expecting ASCII commands).
It's ultra annoying during debugging sessions (that is not when in a full agentic loop where it YOLOs a solution): you can't easily cut/paste from the CLI because the output you get is not what the model did output.
Mega, mega, mega annoying.
What should be something simple becomes a rube-goldberg machinery that, of course, fucks up something fundamental: converting the model's characters to something else is just pathetically bad.
Anyone from Anthropic reading? Get your shit together: if you keep this "headless browser rendering converted to text", at least do not fucking modify the characters.*
A 500k line codebase for an agent CLI proves one thing: making a probabilistic LLM behave deterministically is a massive state-management nightmare. Right now, they're great for prompting simple sites/platforms but they break at large enterprise repos.
If you don't have a rigid, external state machine governing the workflow, you have to brute-force reliability. That codebase bloat is likely 90% defensive programming; frustration regexes, context sanitizers, tool-retry loops, and state rollbacks just to stop the agent from drifting or silently breaking things.
The visual map is great, but from an architectural perspective, we're still herding cats with massive code volume instead of actually governing the agents at the system level.
I find it really strange that there is so much negative commentary on the _code_, but so little commentary on the core architecture.
My takeaway from looking at the tool list is that they got the fundamental architecture right - try to create a very simple and general set of tools on the client-side (e.g. read file, output rich text, etc) so that the server can innovate rapidly without revving the client (and also so that if, say, the source code leaks, none of the secret sauce does).
Overall, when I see this I think they are focused on the right issues, and I think their tool list looks pretty simple/elegant/general. I picture the server team constantly thinking - we have these client-side tools/APIs, how can we use them optimally? How can we get more out of them. That is where the secret sauce lives.
It’s not surprising. There has been quite a bit of industrial research in how to manage mere apes to be deterministic with huge software control systems, and they are an unruly bunch I assure you.
It's hard to tell how much it says about difficulty of harnessing vs how much it says about difficulty of maintaining a clean and not bloated codebase when coding with AI.
Isn't it a simple REPL with some tools and integrations, written in a very high level language? How the hell is it so big? Is it because it's vibecoded and LLMs strive for bloat, or is it meaningful complexity?
I guess they really do eat their own dogfood and vibe code their way through it without care for technical debt? In a way, it’s a good challenge, but it’s fairly painful to watch the current state of the project (which is about a year old now, so it should be in prime shape).
Appreciate the effort, but this is very basic and nothing you need the source code to understand. I was expecting a deep dive into what specific decisions they made, but not how an loop of tool calls works
There's this weird thing about AI generated content where it has the perfect presentation but conveys very little.
For example the whole animation on this website, what does it say beyond that you make a request to backend and get a response that may have some tool call?
Kairos and auto-dream are more interesting than anything in the agent loop section. Memory consolidation between sessions is the actual unsolved problem. The rest is just plumbing tbh
Thanks to Claude Code, we got such a beautifully polished and dazzling website that gives a complete introduction to itself the very moment the leak happened :)
Pardon me, but I think it's rather obvious that it worked this way?
The real value of Anthropic is in the models that they spent hundreds of millions training. Anyone can build a frontend that does a loop, using the model to call tools and accomplish a task. People do it every day.
Sure, they've worked hard to perfect this particular frontend. But it's not like any of this is revolutionary.
519K lines of code for something that is using the baseline *nix tools for pretty much everything important, how do they even manage to bloat it this much? I mean I know how technically, but it's still depressing.
Can't they ask CC to make it good, instead of asking it to make it bigger?
I have no engineering background. I build websites and tools for a living. Claude Code changed what's possible for me in a way that's hard to overstate.
I can't evaluate the source code architecture. What I can say is that before this, I had ideas I couldn't execute without hiring a developer. Now I ship them myself. Not prototypes, not demos. Real products that people use and pay for.
The leaked internals are interesting to engineers. From where I sit, the interesting part is that it works well enough that someone without a CS degree can build production software with it. That's the actual story.
The interesting thing about agent tool use is how binary it is. The agent either calls the tool or doesn't. The harder problem is social agency, where the AI has to decide whether to participate at all. We built a pre-filter for this (cheap model reads the room before the expensive model runs) and the failure modes are fascinating. The model would reason correctly in its chain-of-thought, 'this person is left hanging, I should respond' and then output the opposite boolean. Turned out Haiku has a systematic false-bias on boolean tool outputs. Had to invert the schema
Hello everyone! It's me behind the website.
I launched the site minutes after the leak, obviously vibecoded it.
Kept working on it day and night to fix all the issues.
I was using vercel free plan and did not expect this huge response. The site went down when I took a nap of 3 hours.
Woke up with calls from my team for a meeting. Saw the msgs of people telling me site is down.
Fixed the issue.
And now, I am updating it on regular speed.
Thank you for all the positive and negative feedback, Will consider it all in my future projects.
/stickers:
Displays earned achievement stickers for milestones like first commit, 100 tool calls, or marathon sessions. Stickers are stored in the user profile and rendered as ASCII art in the terminal.
That is not what it does at all - it takes you to a stickermule website.
What is the motivation for someone to put out junk like this?
I like the Claude desktop interface. The color scheme, presentation, fonts, etc. Is there a CSS I can find for the desktop version - I assume it's using some kind of web rendering engine and CSS is part of it.
On the one hand I don't understand why it needs to be half a million lines. However code is becoming machine shaped so the maintenance bloat of titanic amounts of code and state are actually shrinking.
Really nice visualisation of this, makes understanding the flow at a high levle pretty clear. Also the tool system and command catalog, particularly the gated ones are super interesting.
I don't know why people obsess and spend so much time on this codebase. It isn't (and never was)alien technology. It's just mediocre typescript generated by an LLM
I mean, I get it: vibe-coded software deserves vibe-coded coverage. But I would at least appreciate it if the main part of it, the animation, went at a speed that at least makes it possible to follow along and didn't glitch out with elements randomly disappearing in Firefox...
400 comments
I've been working on my own coding agent setup for a while. I mostly use pi [0] because it's minimal and easy to extend. When the leak happened, I wanted to study how Anthropic structured things: the tool system, how the agent loop flows, A 500K line codebase is a lot to navigate, so I mapped it visually to give myself a quick reference I could come back to while adapting ideas into my own harness and workflow.
I'm actively updating the site based on feedback from this thread. If anything looks off, or you find something I missed, lmk.
[0] https://pi.dev/
>> It would be insane to code anything by hand these days.
I strongly disagree, but it made me chuckle a bit, thinking about labeling software as "handmade" or marketing software house as "artisanal".
Claude is actually a crazy good vuln researcher. If you use it that way, your code might just be more secure than written purely by hand.
But egos are involved.
The only suggestion/nit I have is that you could add some kind of asterisk or hover helper to the part when you talk about 'Anthropic's message format', as it did make me want to come here and point out how it's ackchually OpenAI's format and is very common.
Only because I figure if this was my first time learning about all this stuff I think I'd appreciate a deep dive into the format or the v1 api as one of the optional next steps.
I had used pi and cc to analyze the unpacked cc to compare their design, architecture and implementation.
I guess your site was also coded with pi and it is very impressive. Wonderful if you can do a visualization for pi vs cc as well. My local models might not be powerful enough.
Thanks for the hard work!
https://gist.github.com/ontouchstart/d7e3b7ec6e568164edfd482... (cc)
M5 (24G)
> just TUIs
For starters, CC's TUI is React-based.
" Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".
For each frame our pipeline constructs a scene graph with React then -> layouts elements -> rasterizes them to a 2d screen -> diffs that against the previous screen -> finally uses the diff to generate ANSI sequences to draw
We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written. "
> These are just TUIs that call a model endpoint with some shell-out commands.
Claude Code CLI is actually horrible: it's a full headless browser rendering that's then converted in real-time to text to show in the terminal. And that fact leaks to the user: when the model outputs ASCII, the converter shall happily convert it to Unicode (no latter than yesterday there was a TFA complaining about Unicode characters breaking Unix pipes / parsers expecting ASCII commands).
It's ultra annoying during debugging sessions (that is not when in a full agentic loop where it YOLOs a solution): you can't easily cut/paste from the CLI because the output you get is not what the model did output.
Mega, mega, mega annoying.
What should be something simple becomes a rube-goldberg machinery that, of course, fucks up something fundamental: converting the model's characters to something else is just pathetically bad.
Anyone from Anthropic reading? Get your shit together: if you keep this "headless browser rendering converted to text", at least do not fucking modify the characters.*
If you don't have a rigid, external state machine governing the workflow, you have to brute-force reliability. That codebase bloat is likely 90% defensive programming; frustration regexes, context sanitizers, tool-retry loops, and state rollbacks just to stop the agent from drifting or silently breaking things.
The visual map is great, but from an architectural perspective, we're still herding cats with massive code volume instead of actually governing the agents at the system level.
My takeaway from looking at the tool list is that they got the fundamental architecture right - try to create a very simple and general set of tools on the client-side (e.g. read file, output rich text, etc) so that the server can innovate rapidly without revving the client (and also so that if, say, the source code leaks, none of the secret sauce does).
Overall, when I see this I think they are focused on the right issues, and I think their tool list looks pretty simple/elegant/general. I picture the server team constantly thinking - we have these client-side tools/APIs, how can we use them optimally? How can we get more out of them. That is where the secret sauce lives.
> 500k lines of code
Isn't it a simple REPL with some tools and integrations, written in a very high level language? How the hell is it so big? Is it because it's vibecoded and LLMs strive for bloat, or is it meaningful complexity?
I love your implementation.
Here was my first stab:
https://news.ycombinator.com/item?id=47595140
https://brandonrc.github.io/journey-through-claude-code/
I looked at the leaked code expecting some "secret sauce", but honestly didn't found anything interesting.
I don't get the hype around Claude Code. There's nothing new or unique. The real strength are the models.
https://ccprompts.info
Also I definitely want a Claude Code spirit animal
For example the whole animation on this website, what does it say beyond that you make a request to backend and get a response that may have some tool call?
> also related:
https://www.ccleaks.comThis deployment is temporarily paused
The real value of Anthropic is in the models that they spent hundreds of millions training. Anyone can build a frontend that does a loop, using the model to call tools and accomplish a task. People do it every day.
Sure, they've worked hard to perfect this particular frontend. But it's not like any of this is revolutionary.
I use it all day and love it. Don't get me wrong. But it's a terminal-based app that talks to an LLM and calls local functions. Ooookay…
I can't evaluate the source code architecture. What I can say is that before this, I had ideas I couldn't execute without hiring a developer. Now I ship them myself. Not prototypes, not demos. Real products that people use and pay for.
The leaked internals are interesting to engineers. From where I sit, the interesting part is that it works well enough that someone without a CS degree can build production software with it. That's the actual story.
Kept working on it day and night to fix all the issues. I was using vercel free plan and did not expect this huge response. The site went down when I took a nap of 3 hours.
Woke up with calls from my team for a meeting. Saw the msgs of people telling me site is down.
Fixed the issue.
And now, I am updating it on regular speed.
Thank you for all the positive and negative feedback, Will consider it all in my future projects.
Here is another one that goes in depth as well: www.markdown.engineering for anyone going deep on learning.
First command I looked at:
That is not what it does at all - it takes you to a stickermule website.What is the motivation for someone to put out junk like this?
https://github.com/simple10/agents-observe
How is this on the front page?