Show HN: Smol machines – subsecond coldstart, portable virtual machines (github.com)

by binsquare 152 comments 499 points
Read article View on HN

152 comments

[−] binsquare 27d ago
Hello, I'm building a replacement for docker containers with a virtual machine with the ergonomics of containers + subsecond start times.

I worked in AWS previously in the container space + with firecracker. I realized the container is an unnecessary layer that slowed things down + firecracker was a technology designed for AWS org structure + usecase.

So I ended up building a hybrid taking the best of containers with the best of firecracker.

Let me know your thoughts, thanks!

[−] PufPufPuf 27d ago
Hey this is super cool. I've been researching tech like this for my AI sandboxing solution, ended up with Lima+Incus: https://github.com/JanPokorny/locki

My problem with microVMs was that they usually won't run docker / kubernetes, I work on apps that consist of whole kubernetes clusters and want the sandbox to contain all that.

Does your solution support running k3s for example?

[−] fqiao 27d ago
we will evaluate. I created this issue to track this: https://github.com/smol-machines/smolvm/issues/150

Really appreciate the feedback!

[−] jaytheseveloper 26d ago
Curious what prevented you from running docker in a microvm? I've successfully run docker in qemu and firecracker microvms.
[−] PufPufPuf 22d ago
I've been experimenting with krunvm and there the init system is weird. Didn't evaluate firecracker since it doesn't support macOS hosts.
[−] fqiao 24d ago
So we originally just wanted the vm to be light-weight and can start fast, so we removed docker. @binsquare has a change later to add docker back.
[−] mkagenius 27d ago
With instavm (https://instavm.io), you can provide an OCI image built from a dockerfile.
[−] PufPufPuf 22d ago
That's not the same as running the docker daemon in the sandbox.
[−] topspin 27d ago
What is the status of supporting live migration?

That's the one feature of similar systems that always gets left out. I understand why: it's not a priority for "cloud native" workloads. The world, however, has work loads that are not cloud native, because that comes at a high cost, and it always will. So if you'd like a real value-add differentiator for your micro-VM platform (beyond what I believe you already have,) there you go.

Otherwise this looks pretty compelling.

[−] genxy 27d ago
It helps if you offer a concrete use case, as in how large the heap is, what kinda of blackout period you can handle, and whether the app can handle all of it's open connections being destroyed, etc. The more an app can handle resetting some of it's own state, the easier LM is going to be to implement. If your workload jives with CRIU https://github.com/checkpoint-restore/criu you could do this already.

By what I assume is your definition, there are plenty of "non cloud native" workloads running on clouds that need live migration. Azure and GCP use LM behind the scenes to give the illusion of long uptime hosts. Guest VMs are moved around for host maintenance.

[−] topspin 27d ago
"Azure and GCP use LM behind the scenes"

As does OCI, and (relatively recently) AWS. That's a lot of votes.

Use case: some legacy database VM needs to move because the host needs maintenance, the database storage (as opposed to the database software) is on a iSCSI/NFS/NVMe-oF array somewhere, and clients are just smart enough to transparently handle a brief disconnect/reconnect (which is built-in to essentially every such database connection pool stack today.)

Use case: a web app platform (node/spring/django/rails/whatever) with a bunch of cached client state needs to move because the host needs maintenance. The developers haven't done all the legwork to make the state survive restart, and they'll likely never get time needed to do that. That's essentially the same use case as previous. It's also rampant.

Use case: a long running batch process (training, etc.) needs to move because reasons, and ops can't wait for it to stop, and they can't kill it because time==money. It's doesn't matter that it takes an hour to move because big heap, as long as the previous 100 hours isn't lost.

"as in how large the heap is"

That's an undecidable moving target, so let the user worry about it. Trust them to figure out what is feasible given the capabilities of their hardware and talent. They'll do fine if you provide the mechanism. I've been shuffling live VMs between hosts for 10+ years successfully, and Qemu/KVM has been capable of it for nearly 20, never mind VMware.

"CRIU"

Dormant, and still containers. Also, it's re-solving solved problems once you're running in a VM, but with more steps.

[−] linsomniac 27d ago
Somewhat related: I have a branch of Ganeti that has first-class ZFS support baked in, including using ZFS snapshot replication to do live migration without shared storage or CEPH: https://github.com/linsomniac/ganeti

Current status is I'm looking for more feedback. In a few weeks when Ubuntu 26.04 comes out I'm going to set up my dev/stg clusters at work with it, at the moment I've only tested it in a test cluster at home.

It works this way: It creates a snapshot of the zvol, and replicates it to the secondary machine. When that's done, it does another snapshot and does a "catch up" replication (the first replication could take hours on large volumes). Pause the VM, do a final snapshot+replication. Replicate the working RAM. Start up the VM on the new host.

[−] fqiao 27d ago
Really appreciate the suggestion! By "live migration", do you mean keeping the existing files and migrate them elsewhere with the vm?

Thanks

[−] topspin 27d ago
I mean making any given VM stop on host A and appear on host B; e.g. standard Qemu/KVM:

    virsh migrate --live GuestName DestinationURL
This is feasible when network storage is available and useful when a host needs to be drained for maintenance.
[−] benswerd 27d ago
Live migrations and the tech powering it was the hardest thing I ever built. Its something that I think will come naturally to projects like smolVM as more of the hypervisors build it in, but its a deeply challenging task to do in userspace.

My team spent 4 months on our implementation of vm memory that let us do it and its still our biggest time suck. We also were able to make assumptions like RDMA that are not available.

All that to say — as someone not working on smolVMs — I am confident smolVMs and most other OSS sandbox implementations will get live migration via hypervisor upgrades in the next 12 months.

Until then there are enterprise-y providers like that have it and great OSS options that already solve this like cloud hypervisor.

[−] fqiao 27d ago
I see. so right now smolvm can be stopped, and then "packed" (think of it as compressed), and restart on a different host. files in the disks are preserved, but memory snapshotting is still hard tbh
[−] fragmede 26d ago
Ultimately the original does get stopped, but with additional techniques, we're talking milliseconds of downtime between when the old one stops and the new one resumes. (For live migration technology in general, no clue about smol machines.)
[−] sureglymop 27d ago
It's also feasible without network storage, --copy-storage-all will migrate all disks too.
[−] harshdoesdev 27d ago
+1. i built something similar called shuru.run because i wanted an easy way to set up microVM sandboxes to run some of my AI apps, and firecracker wasn't available for macOS (and, as you said, it is just too heavy for normal user-level workloads).
[−] sahil-shubham 27d ago
Nice work on Shuru — I remember looking at it when I was researching this space. You went with a Rust wrapper on Apple’s Virtualization framework right?

I have been working on something similar but on top of firecracker, called it bhatti (https://github.com/sahil-shubham/bhatti).

I believe anyone with a spare linux box should be able to carve it into isolated programmable machines, without having to worry about provisioning them or their lifecycle.

The documentation’s still early but I have been using it for orchestrating parallel work (with deploy previews), offloading browser automation for my agents etc. An auction bought heztner server is serving me quite well :)

[−] harshdoesdev 27d ago
bhatti's cli looks very ergonomic! great job!

also, yes, shuru was (still) a wrapper over the Virtualization.framework, but it now supports Linux too (wrapper over KVM lol)

[−] davidcollantes 26d ago
Is there a way to store configuration/data of applications running on a Bhatti VM on the host, ala Docker volumes?
[−] sahil-shubham 26d ago
Yes! Checkout the bhatti volume

They are ext4 blocks which exist independent of sandboxes.

[−] fqiao 27d ago
Yes, having a light-weight solution for local devices as well is one primary goal of the design. Another one is to make it easy for hosting, self or managed
[−] gavinray 27d ago
The feature that lets you create self-contained binaries seems like a potentially simpler way to package JVM apps than GraalVM Native.

Probably a lot of other neat usecases for this, too

  smolvm pack create --image python:3.12-alpine -o ./python312
  ./python312 run -- python3 --version
  # Python 3.12.x — isolated, no pyenv/venv/conda needed
[−] mrbluecoat 27d ago
Can .smolmachine be digitally signed and self authenticate when run? Similar to https://docs.sylabs.io/guides/main/user-guide/signNverify.ht...
[−] chwzr 27d ago
I see the alpine and python:3.12-alpine images in your cli docs. Where does these come from?is it from a docker like registry or are these built in? Can I create my own images? Or this this purely done with the smolfile? Is there a Ubuntu image available?

Looks really nice btw. Hot resize mem/cpu would be nice. This could become a nice tech for a one-backend-per-customer infra orchestrator then.

[−] cr125rider 27d ago
Great job with the comparison table. Immediately I was like “neat sounds like firecracker” then saw your table to see where it was similar and different. Easy!

Nice job! This looks really cool

[−] nonameiguess 27d ago
What are you actually doing on top of libkrun? Providing really small machine images that boot quickly? If I run the smolvm run --image alpine example, what is "alpine?" Where is that image coming from? Does this have some built-in default registry of machine images it pulls from? Does it need an Internet connection that allows outbound access to wherever this registry runs? Is it one of a default set of pre-built images that comes with the software itself and is stored on my own filesystem? Where are the builds for these images? Where do these machine images end up? ~/.local/share/smolvm/?
[−] lambdanodecore 27d ago
Basically any open source project nowadays run their software stack in containers often requiring docker compose. Unfortunatley Smol machines do not support Docker inside the microvms and they also do not support nested VMs for things that use Vagrant. I think this is a big drawback.
[−] Palmik 27d ago
Could it be made even faster using some of the ideas from https://github.com/zerobootdev/zeroboot ?
[−] irickt 27d ago
Is there a relation to the similarly-purposed and similarly-named https://github.com/CelestoAI/SmolVM
[−] sureglymop 27d ago
What I really like about containers is quickly being able to spin one up without having to specify resources (e.g. RAM limit). I hope this would let me do that also.
[−] simonreiff 27d ago
Hey this is pretty neat! I definitely would try using this for benchmarks and other places where I need strong isolation as Docker is just too bloated and slow, but sadly I don't think I can run this natively on my Windows laptop. I hope you extend to WSL! Good luck and congrats on launch.
[−] isterin 27d ago
We’re using smolmachines to create environments for our agents to execute code. It’s been great so far and the team is super responsive. The dev ergonomics are also great.
[−] zekenie 27d ago
This project is very cool! One readme nit: "Pack a stateful virtual machine into a single file (.smolmachine) to rehydrate on any supported platform." For awhile I thought this meant that you could rehydrate a machine's memory like you can with a firecracker vm, but as far as I can tell you can't? It's stateful == disk?
[−] akoenig 27d ago
smolvm is awesome. The team is highly responsive and very experienced. They clearly know what they’re doing.

I’m currently evaluating smolvm for my project, https://withcave.ai, where I’m using Incus for isolation. The initial integration results look very promising!

[−] estetlinus 27d ago
Why would I prefer smol machines over docker sandbox? Do you have an elevator pitch?
[−] 0cf8612b2e1e 27d ago
This looks very cool. Does the VM machinery still work if I run it in a bubblewrap? Can it talk to a GPU?

Can you pipe into one? It would be cute if I could wget in machine 1 and send that result to offline machine 2 for processing.

[−] traceroute66 27d ago
Sounds very similar to the various unikernel implementations floating around ? Such as Unikernel[1]

[1] https://unikraft.org

[−] fqiao 27d ago
Give it a try folks. Would really love to hear all the feedbacks!

Cheers!

[−] samhclark 27d ago
This is a very cool project and I'm happy to see it getting traction here. I stumbled upon it when I was looking to build something similar and surveying the state of the art...then I realized you built _exactly_ what I wanted!

Thank you, great work!

[−] gigatexal 26d ago
im keen to check this out. since I've moved 100% to the Mac [1] I've been keen to move away from Docker to something like Apple Containers [2] which runs each "container" as an isolated vm. So I wanna try this out, too.

[1] shameful self plug: https://gigatexal.blog/pages/i-heart-my-macbook/i-heart-my-m...

[2] https://github.com/apple/container

[−] brianjlogan 27d ago
Any integration with existing orchestrators? Plans to support any or building your own?
[−] ukuina 27d ago
Doesn't Docker's sbx do this?

https://docs.docker.com/reference/cli/sbx/

[−] bch 27d ago
see too[0][1] for projects of a similar* vein, incl historical account.

*yes, FreeBSD is specifically developed against Firecracker which is specifically avoided w "Smol machines", but interesting nonetheless

[0] https://github.com/NetBSDfr/smolBSD

[1] https://www.usenix.org/publications/loginonline/freebsd-fire...

[−] rkagerer 27d ago
I see you support Linux and MacOS hosts. Any Windows support planned?
[−] timsuchanek 27d ago
This is very exciting. It enables a cross platform, language agnostic plugin system, especially for agents, while being safe in a VM.
[−] 2001zhaozhao 27d ago
Wow, this seems very useful for coding agent sandbox environments that have full browser installations and the like.
[−] harshdoesdev 27d ago
its a really innovative idea! very interested in the subsecond coldstart claim, how does it achieve that?
[−] parasitid 27d ago
hi! congrats for your work that's really nice.

question: why do you report that qemu is 15sthanks a lot

[−] rcarmo 27d ago
Would love to have this as a Proxmox guest type
[−] akdev1l 27d ago
How does it compare to podman with crun-vm ?
[−] chrisweekly 27d ago
This looks awesome. Thanks for sharing!
[−] geniium 27d ago
Congrats that looks really amazing!
[−] messh 27d ago
https://shellbox.dev is a hosted version of something very similar
[−] todotask2 27d ago
How many smolvm can you find?
[−] rawoke083600 27d ago
I like the name ! :)
[−] ljcoco 25d ago
congrats on the launch binbin, this is cool!
[−] dimitry12 27d ago
https://github.com/earendil-works/gondolin is another project addressing a similar use-case.
[−] cperciva 27d ago
See also SmolBSD -- similar idea, similar name, using NetBSD.
[−] t_messinis 24d ago
[dead]
[−] danelliot 27d ago
[dead]
[−] volume_tech 27d ago
[dead]
[−] volume_tech 27d ago
[dead]
[−] kevinten10 27d ago
[dead]
[−] vrr044 25d ago
[dead]