My friends and I have been running a similar homegrown system on a VM at home: Claude Code in a GNU screen managed by systemd, Cloudflare tunnels, Graphiti memory system, a Discord channel plugged into Claude to drive it, and Temporal for all sorts of workflows and crons that it builds on its own.
It arrived at the same incredibly fun behavior as you talk about in the readme, where the agent just builds all sorts of junk for you autonomously. It has built dozens of web apps, static pages, mini games, etc. all tied back into a central domain that I gave it. I truly have no idea what the system or code looks like but it’s been so much fun just letting it build.
The “For People Who Don't Write Code” is so true as well. We have someone in discord that has never written code but they can ask the agent to build virtually anything, it goes off and churns, then pops back with a link to it running live. It’s honestly been so much fun with friends, highly recommend trying it out.
This is awesome to hear. The "I truly have no idea what the system or code looks like but it's been so much fun just letting it build" resonates hard. That's exactly the experience we had too.
The "For People Who Don't Write Code" angle has been the biggest surprise for us. We had a non-technical user ask for a Chrome extension and the agent built it, packaged it as a zip, and sent the download link. No terminal, no dev environment needed.
If you ever want to formalize your setup, we built Specter (https://github.com/ghostwright/specter) to provision VMs with DNS, TLS, and systemd in under 90 seconds. Makes spinning up new instances trivial. Would love to hear more about your Graphiti memory setup, that's a different approach than our Qdrant-based system.
Graphiti is interesting because it’s ingesting episodes (discord chat messages), extracting facts and relationships, and then allows the agent to query that back, keeping the relationships in tact. So rather than a flat list of vectors related to a search term, the agent can essentially walk from one fact/concept to another. While plain vector search says something exists, the edges in the graph denote how/why it exists and provide extra context.
It’s a bit frightening in practice because it starts building up “knowledge” of what everyone in our group is interested in (games, hobbies, food) and their personalities, politics, etc. Sonnet 4.6 in particular tends to query the graph and make jokes, matching the vibe on discord.
On a more serious use-case: it also stores system topology in the graph so, while it does document the system in various READMEs and CLAUDE.md files, the graph provides a fast at a glance reference for how the systems interact. I have no evidence but I imagine this could be useful and more dense / token efficient than massive documentation, even for products, features, etc.
> I truly have no idea what the system or code looks like
Does it not concern you if it installed a compromised package, vulnerable exploit, or it has something exposed and leaking everything to an attacker?
I understand that your personal account is removed from it, but still, it has a direct link to you, and an attacker could be just building up towards it stealthily to hit when the time is right, maybe it gains SSH into your VM or whatever
eh I can nuke the VM and start fresh. Everything is in git anyway. As for sensitive data, it has its own accounts and no credit cards etc so the blast radius feels limited. I would say this is a fundamental impediment to being used in serious use-cases but for some friends messing around I’m not worried.
It could have installed say, that vulnerable version of litellm, and the entire VM is compromised. But it’s on an isolated vlan anyway so the worst it can really do is use bandwidth and maybe hurt my IP reputation? I could move it to a cloud VM but the risks seem minimal at the moment. I’m definitely not advocating for no defense in depth, but npm install in an isolated VM feels safer than npm install on my work laptop these days :-)
Some of the other aspects of the project are quite interesting, I particularly liked https://github.com/ghostwright/shadow I think this has potential, but I am skeptical right now.
What is the actual cost of this? Can you share your real burn rate through using this, I sort of wanna try but don't want my API Key to go bananas because the agent decided it needed XYZ for "it" and didn't check with me
I get the appeal for the separate "identity" with email and everything for the agent, but then, if it has little to no supervision, what's the liability extent when it goes rogue? Say it DDoS someone, it exploits something, it does damage, is this like your child/minor and you're the parent/guardian?
The self-tooling capability is the interesting part here, not the VM persistence.
The cost/governance question is real though. I've spent 15 years in product management and the pattern is always the same: autonomous systems that compound capabilities sound great until you need to explain to someone why it did what it did.
The gap isn't "can the agent build things" — it clearly can. The gap is: did it build the thing you actually needed? And how do you verify that at scale without manually reviewing every output?
Self-modifying config is a feature when it's right and a liability when it's wrong. The interesting design question is how you build the verification layer.
So if I understand this it is an OpenClaw type system but based on the Claude Code Agent SDK? And they suggest installing it on a VM? Or is there more to it?
16 comments
It arrived at the same incredibly fun behavior as you talk about in the readme, where the agent just builds all sorts of junk for you autonomously. It has built dozens of web apps, static pages, mini games, etc. all tied back into a central domain that I gave it. I truly have no idea what the system or code looks like but it’s been so much fun just letting it build.
The “For People Who Don't Write Code” is so true as well. We have someone in discord that has never written code but they can ask the agent to build virtually anything, it goes off and churns, then pops back with a link to it running live. It’s honestly been so much fun with friends, highly recommend trying it out.
The "For People Who Don't Write Code" angle has been the biggest surprise for us. We had a non-technical user ask for a Chrome extension and the agent built it, packaged it as a zip, and sent the download link. No terminal, no dev environment needed.
If you ever want to formalize your setup, we built Specter (https://github.com/ghostwright/specter) to provision VMs with DNS, TLS, and systemd in under 90 seconds. Makes spinning up new instances trivial. Would love to hear more about your Graphiti memory setup, that's a different approach than our Qdrant-based system.
It’s a bit frightening in practice because it starts building up “knowledge” of what everyone in our group is interested in (games, hobbies, food) and their personalities, politics, etc. Sonnet 4.6 in particular tends to query the graph and make jokes, matching the vibe on discord.
On a more serious use-case: it also stores system topology in the graph so, while it does document the system in various READMEs and CLAUDE.md files, the graph provides a fast at a glance reference for how the systems interact. I have no evidence but I imagine this could be useful and more dense / token efficient than massive documentation, even for products, features, etc.
> I truly have no idea what the system or code looks like
Does it not concern you if it installed a compromised package, vulnerable exploit, or it has something exposed and leaking everything to an attacker?
I understand that your personal account is removed from it, but still, it has a direct link to you, and an attacker could be just building up towards it stealthily to hit when the time is right, maybe it gains SSH into your VM or whatever
It could have installed say, that vulnerable version of litellm, and the entire VM is compromised. But it’s on an isolated vlan anyway so the worst it can really do is use bandwidth and maybe hurt my IP reputation? I could move it to a cloud VM but the risks seem minimal at the moment. I’m definitely not advocating for no defense in depth, but npm install in an isolated VM feels safer than npm install on my work laptop these days :-)
> I would say this is a fundamental impediment to being used in serious use-cases
Fair point, so it's really a fancy tamagotchi you got there I guess haha
What is the actual cost of this? Can you share your real burn rate through using this, I sort of wanna try but don't want my API Key to go bananas because the agent decided it needed XYZ for "it" and didn't check with me
I get the appeal for the separate "identity" with email and everything for the agent, but then, if it has little to no supervision, what's the liability extent when it goes rogue? Say it DDoS someone, it exploits something, it does damage, is this like your child/minor and you're the parent/guardian?
The cost/governance question is real though. I've spent 15 years in product management and the pattern is always the same: autonomous systems that compound capabilities sound great until you need to explain to someone why it did what it did.
The gap isn't "can the agent build things" — it clearly can. The gap is: did it build the thing you actually needed? And how do you verify that at scale without manually reviewing every output?
Self-modifying config is a feature when it's right and a liability when it's wrong. The interesting design question is how you build the verification layer.
> Nobody asked it to build any of this. It identified analytics as useful and built the entire stack.
When I read stuff like this I am not sure how to feel.