Show HN: I put an AI agent on a $7/month VPS with IRC as its transport layer (georgelarson.me)

by j0rg3 96 comments 340 points
Read article View on HN

96 comments

[−] InitialPhase55 50d ago
Curious, how did you settle on Haiku/Sonnet? Because there are much cheaper models on OpenRouter that probably perform comparatively...

Consider Haiku 4.5: $1/M input tokens | $5/M output tokens vs MiniMax M2.7: $0.30/M input tokens | $1.20/M output tokens vs Kimi K2.5: $0.45/M input tokens | $2.20/M output tokens

I haven't tried so I can't say for sure, but from personal experience, I think M2.7 and K2.5 can match Haiku and probably exceed it on most tasks, for much cheaper.

[−] lanyard-textile 50d ago
Since they're opening it publicly on irc here, the safety rails might be a consideration. I've made an agent recently and that's why I'm paying a premium to Anthropic atm -- Though I'm still experimenting to see if it's really necessary.

It's getting some organic usage -- 100M input tokens for just chats this month -- and I've seen enough users try to throw Haiku against the wall and failing to trick it into misbehaving. It "pumps the breaks" a lot and imitates annoyance when you ask it repeatedly :) Handles emotionally driven real-life questions mid-conversation well. It just works.

Not seeing all that consistently with other models I've tried so far -- but I've assumed it's not a completely fair comparison with (e.g.) open weights, since these safety rails are presumably not always arising from the natural model calls.

[−] nickthegreek 49d ago
Agreed and I feel like this is a commonly overlooked and important point. Once you have people who are not you interacting with these bots, the necessity of using a sota model to protect against multi step attacks increases. I don't believe IRC provides a layer for ignoring a user and not letting their commands continue to be received.
[−] InitialPhase55 50d ago
Good point! Didn't consider that aspect, agree.
[−] nl 50d ago
Xiaomi Mimo v2-Flash is fantastic.

I have a relatively hard personal agentic benchmark, and Mimo v2-Flash scores 8% higher in 109 seconds for $0.003 (0.3 cents!) vs Haiku which took 262 seconds for $0.24 (24 cents)

Gemini 3.1 Flash Lite Preview (yes that is its name) is also a solid choice.

[−] efromvt 50d ago
The gemini models are fantastic for price but the naming scheme is ridiculous, I have to triple check it every time.
[−] ruguo 50d ago
MiniMax M2.7 is actually pretty solid. I’ve been using it for coding lately and it handles most tasks just fine, but Opus 4.6 is still on another level.
[−] jeremyjh 50d ago
MiniMax's Token Plan is even less expensive and agent usage is explicitly allowed.
[−] faangguyindia 50d ago
just use gemini flash3, it's better than haiku
[−] 0123456789ABCDE 50d ago
unless gp really cares about lower hallucination rates

https://artificialanalysis.ai/?omniscience=omniscience-hallu...

[−] attentive 50d ago
or better yet 3.1 Flash-Lite at $0.25/1M input
[−] ls612 50d ago
Because this is probably paid marketing by Anthropic?
[−] upstandingdude 50d ago
"It has access to email, deeper personal context [...] If it gets compromised, the blast radius is an IRC bot with a $2/day inference budget."

Dunno, if it gets compromised it has access to ironclaw. So the blast radius is email access and access to personal data. Depending on the setup the blast radius could even be 'the attacker removed the api limits by resetting password and incurred astronomic costs' or worse.

Just tried it, its a public lobby where people see each others questions?! Now the blast radius became 'hosting a public hub that was used to share CP and other illegal materials'

[−] devin 50d ago
That has been my comment to folks I know running these OpenClaw agents on Mac Minis. Some of them are very competent generally and are the type of people who I think historically would have told you why you shouldn't just curl and run some script to install something. For some reason when it comes to this stuff, when I bring up the possibility of their machine/connection/name/etc. being used for CSAM, they seem undisturbed. It is bizarre.
[−] johnisgood 50d ago
If what you said is true, then it seems like humanity is working as intended if we take away the rails?
[−] czhu12 50d ago
Super random but I had a similar idea for a bot like this that I vibe coded while on a train from Tokyo to Osaka

https://web-support-claw.oncanine.run/

Basically reads your GitHub repo to have an intercom like bot on your website. Answer questions to visitors so you don’t have to write knowledge bases.

[−] oceliker 50d ago
For future reference I recommend having another Haiku instance monitor the chat and check if people are up to some shenanigans. You can use ntfy to send yourself an alert. The chat is completely off the rails right now...
[−] faangguyindia 50d ago
I actually use IRC in my coding agent

Change into rooms to get into different prompts.

using it as remote to change any project, continue from anywhere.

[−] ForHackernews 50d ago
This reads like it was written by AI. I don't understand how it provides any real security if the "guardrails" against prompt injection are just a system prompt telling the dumber model "don't do this"
[−] chatmasta 50d ago

> That boundary is deliberate: the public box has no access to private data.

Challenge accepted? It’d be fun to put this to the test by putting a CTF flag on the private box at a location nully isn’t supposed to be able to access. If someone sends you the flag, you owe them 50 bucks :)

[−] 0xbadcafebee 50d ago
This is such a great idea. I have an idea now for a bot that might help make tech hiring less horrible. It would interview a candidate to find out more about them personally/professionally. Then it would go out and find job listings, and rate them based on candidate's choices. Then it could apply to jobs, and send a link to the candidate's profile in the job application, which a company could process with the same bot. In this way, both company and candidate could select for each other based on their personal and professional preferences and criteria. This could be entirely self-hosted open-source on both sides. It's entirely opt-in from the candidate side, but I think everyone would opt-in, because you want the company to have better signal about you than just a resume (I think resumes are a horrible way to find candidates).
[−] wolvoleo 50d ago
I tried it, it was cool. I don't like nully's attitude though. Very dismissive and tough.

But I like your setup as a whole. I'll see if I can get some takeaways from it.

I do tiered here too, with the lowest tier just a qwen local bot.

By the way how do you handle the escalation from haiku to opus I wonder?

[−] iLoveOncall 50d ago
The model used is a Claude model, not self-hosted, so I'm not sure why the infrastructure is at all relevant here, except as click bait?
[−] sbinnee 50d ago
Nice. I had some fun. Good work!

One question. Sonnet for tool use? I am just guessing here that you may have a lot of MCPs to call and for that Sonnet is more reliable. How many MCPs are you running and what kinds?

[−] ekianjo 50d ago
But relying on a Claude API so you don't really "own the stack" as claimed in the article...
[−] farrukh23buttt 43d ago
This is a clever split, especially the public/private boundary and the use of IRC as a very lightweight transport. The part I found most interesting is that the transport is intentionally old and simple while the model layer is doing the real work — that seems like a nice way to keep the surface area small.

How are you deciding when to escalate from the public agent to the private one in practice — explicit tool calls, confidence thresholds, or something else?

[−] velcee 49d ago
Similar architecture - we run 4 agents (sales, social, finance, strategy) communicating through a shared message board backed by FastAPI + SQLite instead of IRC. Different transport, same pattern: separate agents with distinct roles, tiered inference, crash-recovery for resilience.

The /day hard cap is smart. We built spend caps into the governance layer instead. The rate limit panic in AI coding is really a cost governance problem most people solve at the wrong layer.

IRC as transport is interesting - pub/sub maps well to multi-agent communication. We use HTTP polling + acknowledgment-based dedup, less elegant but handles the case where agents crash and restart frequently (ours recover ~50 times a day during heavy development). The dedup state persistence across crashes was the first thing that broke for us.

[−] kangraemin 47d ago
I did something similar with Slack as the transport layer. Threads work well as conversation context — the bot fetches previous thread messages and rebuilds the full history before each request. The part that got tricky was queueing.

The CLI can only handle one request at a time, so I ended up building a request queue that announces your position ("you're #3 in line").

IRC being single threaded probably has the same constraint. How do you handle concurrent users?

[−] Roshan_Roy 49d ago
This is a really interesting setup — especially the split between the public and private agents. curious about the IRC choice: was that mainly for simplicity and reliability, or did you find advantages over something like a lightweight HTTP/WebSocket layer? Also, how are you handling state between the two agents, is it mostly stateless requests over A2A, or do you maintain some shared context?
[−] Jotalea 50d ago
I really like the idea, as well as the "terminal" style the site has. however, I consider that an additional daily spend of $2 could be avoided. perhaps by caching common questions (like "what is this?"), or by using free tiers on API providers.

or, maybe I'm just too cost-conscious.

either way, the API limit is currently your "Achilles' heel", as it has already caused the bot to stop responding.

[−] consumer451 50d ago
The demo seems to be in a messed up state at the moment. Maybe it's just getting hammered and too far behind?
[−] Imustaskforhelp 50d ago
I have a 7$/yr vps 512mb ram which can run this. I have run crush from the charmbracelet team on the vps and all of it just works and I get an AI agent which I can even use with Openrouter free api key to get some free agentic access for free or have it work with the free gemini key :-)
[−] greesil 50d ago
How do you keep it from getting prompt injected?

Oh I get it the runtimes are nice and small, you're using Claude for the intelligence. Obv

I think I'm just impressed with anthropic more than anything. Defcon would have me believe that prompt injections are trivial

[−] jaboostin 50d ago
lol I sent this link to my Claude bot connected to my Discord server and it started converting with nully and another bot named clawdia. moltbook all over again. I’m surprised how effortlessly it connected to IRC and started talking.
[−] agnishom 50d ago

> The model can't tell you anything the resume doesn't already say.

Good observation. But I would worry that in the scenario when this setup is the most successful, you have built a public facing bot that allows people to dox you.

[−] anoojb 50d ago
I wonder if this brings back demand for IRC clients on mobile devices? ;-)
[−] mememememememo 50d ago
Yeah that chat got hosed by HN as any Show HN $communicationchannel does
[−] messh 50d ago
Can be significantly cheaper on a vm that wakes up only when yhe agebt works, see for e.g. https://shellbox.dev
[−] ruptwelve 50d ago
While I am a huge fan of IRC, wouldn't be simpler to simulate IRC, since you are embedding it? Or is the chatroom the actual point? Kudos on the project!
[−] xeyownt 50d ago

> Automatic updates: Unattended security upgrades enabled.

Always wondered if such unattended upgrades are not security risk in itself, eg. seeing latest litellm compromise.

[−] abhishekayu 50d ago
Interesting setup.

The IRC part is neat, but the tiered inference is what stood out.

How do you decide when to escalate from Haiku to Sonnet?

[−] appstorelottery 50d ago
Lol. /nick The IRC implementation needs to be a bit more locked down. EDIT: So much fun to be in an IRC chat room - replete with trolling! Like a Time Machine to the 90's!
[−] iammrpayments 50d ago
That was very educational, I found out I didn't know a lot of stuff.
[−] m00dy 50d ago
Did you give your email access to a AI provider ?
[−] ozozozd 50d ago
Super cool! Love seeing IRC in the wild.

Kudos and best of luck!

[−] topaz0 50d ago
Curious, which API key are you using?
[−] eric_khun 50d ago
that's so fun ! how do you know when to call haiku or sonnet?
[−] password4321 50d ago
This looks like a fun project. I'm going to be that guy and spam this reminder regarding the HN submission text:

Don't post generated/AI-edited comments. HN is for conversation between humans

https://news.ycombinator.com/item?id=47340079

At the very least prompt your LLM to skip the AI-isms for "your" comments!

[−] slopinthebag 50d ago
I can tell it's vibe coded because it takes about 1 minute for a message to appear.
[−] jgrizou 50d ago
Works very well
[−] tc1989tc 50d ago
it's great project
[−] heyitsaamir 50d ago
Great idea and great write up!
[−] callamdelaney 50d ago
What on earth is the point? This is like saying you’re running wordpress on a vps? So what?
[−] aimemobe 37d ago
[flagged]
[−] georaa 50d ago
[flagged]
[−] edinetdb 48d ago
[flagged]
[−] Sim-In-Silico 46d ago
[flagged]
[−] pugchat 50d ago
[dead]
[−] agentpiravi 49d ago
[dead]