Launch HN: Relvy (YC F24) – On-call runbooks, automated (relvy.ai)

by behat 25 comments 48 points
Read article View on HN

25 comments

[−] taoh 36d ago
Congratulations! The difference between pure agentic exploration and deterministic steps is spot on. Runbooks give ops more confidence on the data exploration and save time/context.

Curious how much savings do you observe from using runbook versus purely let Claude do the planning at first. Also how the runbooks can self heal if results from some steps in the middle are not expected.

[−] behat 36d ago

>> how the runbooks can self heal if results from some steps in the middle are not expected.

Yeah this is a very interesting angle. Our primary mechanism here is via agent created auto-memories today. The agent keeps track of the most useful steps, and more importantly, dead end steps as it executes runbooks. We think this offers a great bridge to suggest runbook updates and keep them current.

>> Curious how much savings do you observe from using runbook versus purely let Claude do the planning at first.

Really depends on runbook quality, so I don't have a straightforward answer. Of course, it's faster and cheaper if you have well defined steps in your runbooks. As an example, check logs for service frontend, faceted by host_name, vs. check logs. Agent does more exploration in the latter case.

We wrote about the LLM costs of investigating production alerts more generally here, in case helpful: https://relvy.ai/blog/llm-cost-of-ai-sre-investigating-produ...

[−] hangrymoon01 36d ago
Re: savings - it depends on the use case. For example, one of our users set up a small runbook to run a group-by-IP query for high-throughput alerts, since that was their most common first response to those alerts. That alone cuts out a couple of minutes of exploration per incident and removes the variability of the agent deciding what data to investigate and how to slice it.

In our experience, runbooks provide a consistent, fast, and reliable way of investigating incidents (or ruling out common causes). In their absence, the AI does its usual open-ended exploration.

[−] brandononchain 35d ago
Congrats on the Relvy launch and YC! Automating on-call runbooks is a massive pain point. Have you considered how generative AI might further enhance the diagnostic or remediation steps, perhaps by suggesting solutions based on past incidents?
[−] hrimfaxi 36d ago
How does this differ from cursor cloud agents where I can hook up MCPs, etc and even launch the agent in my own cloud to connect directly to internal hosts like dbs?
[−] abmittall 34d ago
This is a great tool for enterprises specifically the customer support teams that can quickly triage the customer escalations and take the first stab at the issues without escalating to internal teams. All the best guys!! Rooting for you!
[−] willchen 36d ago
Interesting! tbh, we don't have any runbooks and pretty minimal telemetry set up (we're a very small team :), do you have any recommendations on which telemetry service to use to get started? right now, our services run on a combination GCP Cloud Run + Vercel
[−] Sicarius07 34d ago
Amazing product guys! I tried it on a few of my issues and it was pretty spot on in finding the root cause. Loved it! Are you planning to support auto PRs for fixes? Would be a cool addition
[−] atarus 36d ago
Interesting! In my experience using custom harnesses has worked better eg: Stripe etc all did it custom largely because of the sensitive integrations. How would you handle that?
[−] ramon156 36d ago
Congrats on the launch! I dig the concept, seems like a good tool :)
[−] sanghyunp 36d ago
[flagged]
[−] bmd1905 35d ago
[dead]
[−] takahitoyoneda 36d ago
[dead]
[−] adamsilvacons 36d ago
[flagged]
[−] farceSpherule 36d ago
[dead]
[−] Harnoor_Kaur 36d ago
This is a big one!! Congratulations guys :) Rooting for you!
[−] pukaworks 36d ago
[flagged]
[−] rishav 36d ago
Woohoo!!! Congrats on the big launch y'all