> In March 2026, I migrated to self-hosted object storage powered by Versity S3 Gateway.
Thanks for sharing this, I wasn't even aware of Versity S3 from my searches and discussions here. I recently migrated my projects from MinIO to Garage, but this seems like another viable option to consider.
First time hearing about Versity for me too. I thought "S3 Gateways" were an Amazon-only service rather than something mere mortals could set up.
I've been trying to give some containers (LXC/D and OCI) unprivileged access to a network-accessible ZFS filesystem and this might be what I need. Managing UID/GID through bind-mounts from the host to the container (ie NFS on host) has been trickier than I was expecting.
It sounds like you're not using Incus, but for those who are, it looks like there is a S3 compatible object storage API built in. I haven't used it for anything yet but it seems relevant:
Assuming you pass in the host UID/GID that is how you can configure a compatible user at the entrypoint.
But note that only highly trusted containers should ever really use host bind mounts, it is often much safer to use mount NFS internally.
Host bind mounts of network filesystems, if that is what you are doing, is also fragile as far as dataloss goes. I am an object store fan, but just wanted to give you the above info as it seems hard for people to find.
I would highly encourage you to look into the history of security problems with host bind mounts to see the wack-a-mole that is required with them to see if it fits in with your risk appitite. But if you choose to use them, setting up dedicated uid/gid mappings and setting the external host to the expected effective ID of the container users is a better way than using Idmapped mounts etc...
I appreciate the detailed explanation of ID/GID mapping.
> it is often much safer to use mount NFS internally
This is the config I'm trying to move away from! I don't see how an unprivileged LXC with a bind mount is worse than a privileged container with NFS, FUSE, and nesting enabled (I need all of that if I can't aggregate on the host).
NFS and CIFS within the container requires kernel-level access and therefore the LXC must be privileged. I'd rather have a single defined path.
I tried to get around this using FUSE but it creates its own issues with snapshots/backups (fsfreeze).
If my solutiom works for a regular LXC it will probably work for Podman.
I can't comment directly on LXC but LXC is very different from runc/crun/your-CRI here, not better or worse, just different.
With podman, unfortunately we don't k8s Container Storage Interface (CSI), so you have to work with what you have.
When I said:
> it is often much safer to use mount NFS internally
What is more correct, is having the container runtime or container manager mount them, not the user inside the container.
But as you are trying to run unprivileged or at least with minimal privileges, which is all we can do with namespaces, you are cutting across the grain.
I do use podman pods and containers, mostly for the ease of development, but on more traditional long lived hosts.
I have a very real need to separate UIDs between co-hosted products, but don't need to actually run a VM for these specific use cases.
So I have particular rootful tasks that have to be done as the user root root in ansible:
1) Install OS packages
2) Create service admin and daemon user
3) Assign subuid/subgids ranges to those user security domains as needed
4) For specific services add NFS data directories to /etc/fstab with the 'user' and 'noauto' flags
In Podman I would then create
podman volume create --driver local --opt type=nfs --opt device=192.168.1.84:/path/to/share --opt o=addr=192.168.1.84,....
podman run -d --name nfs_test -v nfs-shared:/opt docker.io/library/debian:latest
Which if you don't have the fstab entry will give you:
Error: mounting volume nfs-shared for container ...: mount.nfs: Operation not permitted for 192.168.1.84:/path/to/share on /home/user/.local/share/containers/storage/volumes/nfs-shared/_data
That _data is one of the hints of the risk of host bind mounts, the risk is either having an inode that the host cares about or issues across containers etc...
While imperfect, this is following the named volume pattern, which really just uses tells about it being in a container and doesn't expose the mount inode to the container.
What does happen inside the container entry point is validating that the expected UID is reachable, adding a user with the right UID offset and switching to that user.
A misconfigured host bind mount or leaking because you can't view who has access are the most common problems, and as containers run with elevated privileges until you drop them they can get around those protections, even if they aren't elevating to root in a rootless situation, they can still access the data of any running container with just a few trivial mistakes or new discovered vulnerabilities.
While NFS is absolutely a whole new ball of wax with other issues, one nice thing is that (at least the servers I know of) don't even support the concept of user namespaces and UID mapping, which makes it fragile and dangerous if you start mapping uid/gid's in, but can be an advantage if you can simply isolate uid/gid ranges.
IMHO it will be horses for courses and depend on your risk appetite as all options are least worst and there simply will be no best option, especially with OCI.
Wow, I really appreciate you coming back for the follow-up! It's too late for me to read through it in detail at this moment but:
In the end I discovered that I can combine a "mapall/squash" on the NFS server, a regular NFS mount on Proxmox, and then an lxc.mount.entry for the LXC config and the combined effect is an unprivileged container with read-write permissions for the UID/GID specified on the NFS server. If I need more UID/GID combinations I can just create bind mounts and then export those with the appropriate mapall/squash settings.
For this project, where you have 120GB of customer data, and thirty requests a second for ~8k objects (0.25MB/s object reads), you’d seem to be able to 100x the throughput vertically scaling on one machine with a file system and an SSD and never thinking about object storage. Would love to see why the complexity
I don't get it, if it's running on the same (mentioning "local") machine, why does it even need the S3 API? Could just be plain IO on the local drive(s)
As someone who has dealt with wacky storage issues/designs, a lot of this "felt" strange to me. Btrfs? Rsync? Then I got to the bottom and saw that they were only handling about 100 GB of data! At that scale, nearly anything will work great and TFA was right to just pick the thing with the fewest knobs.
At a previous job years ago, we had a service that was essentially a file server for something like 50TB of tiny files. We only backed it up once a week because just _walking_ the whole filesystem with something like du took more than a day. Yes, we should have simply thrown money at the problem and just bought the right solution from an enterprise storage vendor or dumped them all into S3. Unfortunately, these were not options. Blame management.
A close second would have been to rearchitect dependent services to speak S3 instead of a bespoke REST-ish API, deploy something like SeaweedFS, and call it a day. SeaweedFS handles lots of small files gracefully because it doesn't just naively store one object per file on the filesystem like most locally-hosted S3 solutions (including Versity) do. And we'd get replication/redundancy on top of it. Unfortunately, I didn't get buy-in from the other teams maintaining the dependent services ("sorry, we don't have time to refactor our code, guess that makes it a 'you' problem").
What I did instead was descend into madness. Instead of writing each file to disk, all new files were written to a "cache" directory which matched the original filesystem layout of the server. And then every hour, that directory was tarred up and archived. When a read was required, the code would check the cache first. If the file wasn't there, it would figure out which tarball was needed and extract the file from there instead. This only worked because all files had a timestamp embedded in the path. Read performance sucked, but that didn't matter because reads were very rare. But the data absolutely had to be there when needed.
Most importantly, backups took less than an hour for the first time in years.
> The costs have increased: renting an additional dedicated server costs more than storing ~100GB at a managed object storage service. But the improved performance and reliability are worth it.
Were your users complaining about reliability and performance? If it cost more, adds more work (backup/restore management), and the users aren't happier then why make the change in the first place?
It's interesting that cloud providers are unable to provide stable S3 as a service. Hetzner is unable to deliver stable object storage, but given the article neither are OVHCloud and UpCloud.
Moved object storage from AWS to CloudFlare and have been pretty happy. No problems with performance so far. Bills were 90% cheaper too (free bandwidth)
"Our current (April 2026) object usage is: 14 million objects, 119GB"
I mean, I appreciate the openness about the scale, but for context, my home's personal backup managed via restic to S3 is 370GB. Fewer objects, but still, we're not talking a big install here.
This is pretty much like that story of, if it fits on your laptop, it's not big data.
79 comments
> In March 2026, I migrated to self-hosted object storage powered by Versity S3 Gateway.
Thanks for sharing this, I wasn't even aware of Versity S3 from my searches and discussions here. I recently migrated my projects from MinIO to Garage, but this seems like another viable option to consider.
I've been trying to give some containers (LXC/D and OCI) unprivileged access to a network-accessible ZFS filesystem and this might be what I need. Managing UID/GID through bind-mounts from the host to the container (ie NFS on host) has been trickier than I was expecting.
https://linuxcontainers.org/incus/docs/main/howto/storage_bu...
Remember that UID mapping on namespaces is just a facade with an offset and a range typically based on subuid[0] and subgid[1] today.
In the container
Here you know that PID =0 in the container maps to PID 1000 in the host, with a length of 1cat /proc/self/uid_mapor by looking at the pid from the hostcat /proc/$PID/uid_mapyou can tell what those offsets are.The Container PID offset of 1 maps to the host offset of 100000 for a length of 65536
With subuid/subgid you can assign ranges to the user that is instantiating the container, in the flowing I have two users that launch containers.
Assuming you pass in the host UID/GID that is how you can configure a compatible user at the entrypoint.But note that only highly trusted containers should ever really use host bind mounts, it is often much safer to use mount NFS internally.
Host bind mounts of network filesystems, if that is what you are doing, is also fragile as far as dataloss goes. I am an object store fan, but just wanted to give you the above info as it seems hard for people to find.
I would highly encourage you to look into the history of security problems with host bind mounts to see the wack-a-mole that is required with them to see if it fits in with your risk appitite. But if you choose to use them, setting up dedicated uid/gid mappings and setting the external host to the expected effective ID of the container users is a better way than using Idmapped mounts etc...
[0] https://man7.org/linux/man-pages/man5/subuid.5.html [1] https://man7.org/linux/man-pages/man5/subgid.5.html
> it is often much safer to use mount NFS internally
This is the config I'm trying to move away from! I don't see how an unprivileged LXC with a bind mount is worse than a privileged container with NFS, FUSE, and nesting enabled (I need all of that if I can't aggregate on the host).
NFS and CIFS within the container requires kernel-level access and therefore the LXC must be privileged. I'd rather have a single defined path.
I tried to get around this using FUSE but it creates its own issues with snapshots/backups (fsfreeze).
If my solutiom works for a regular LXC it will probably work for Podman.
With podman, unfortunately we don't k8s Container Storage Interface (CSI), so you have to work with what you have.
When I said:
> it is often much safer to use mount NFS internally
What is more correct, is having the container runtime or container manager mount them, not the user inside the container.
But as you are trying to run unprivileged or at least with minimal privileges, which is all we can do with namespaces, you are cutting across the grain.
I do use podman pods and containers, mostly for the ease of development, but on more traditional long lived hosts.
I have a very real need to separate UIDs between co-hosted products, but don't need to actually run a VM for these specific use cases.
So I have particular rootful tasks that have to be done as the user root root in ansible:
1) Install OS packages 2) Create service admin and daemon user 3) Assign subuid/subgids ranges to those user security domains as needed 4) For specific services add NFS data directories to /etc/fstab with the 'user' and 'noauto' flags
In Podman I would then create
Which if you don't have the fstab entry will give you: That_datais one of the hints of the risk of host bind mounts, the risk is either having an inode that the host cares about or issues across containers etc...While imperfect, this is following the named volume pattern, which really just uses tells about it being in a container and doesn't expose the mount inode to the container.
What does happen inside the container entry point is validating that the expected UID is reachable, adding a user with the right UID offset and switching to that user.
A misconfigured host bind mount or leaking because you can't view who has access are the most common problems, and as containers run with elevated privileges until you drop them they can get around those protections, even if they aren't elevating to root in a rootless situation, they can still access the data of any running container with just a few trivial mistakes or new discovered vulnerabilities.
While NFS is absolutely a whole new ball of wax with other issues, one nice thing is that (at least the servers I know of) don't even support the concept of user namespaces and UID mapping, which makes it fragile and dangerous if you start mapping uid/gid's in, but can be an advantage if you can simply isolate uid/gid ranges.IMHO it will be horses for courses and depend on your risk appetite as all options are least worst and there simply will be no best option, especially with OCI.
In the end I discovered that I can combine a "mapall/squash" on the NFS server, a regular NFS mount on Proxmox, and then an
lxc.mount.entryfor the LXC config and the combined effect is an unprivileged container with read-write permissions for the UID/GID specified on the NFS server. If I need more UID/GID combinations I can just create bind mounts and then export those with the appropriate mapall/squash settings.Thanks again :)
For this project, where you have 120GB of customer data, and thirty requests a second for ~8k objects (0.25MB/s object reads), you’d seem to be able to 100x the throughput vertically scaling on one machine with a file system and an SSD and never thinking about object storage. Would love to see why the complexity
this is the kind of setup that lets you actually go to bed without checking your phone every 20 minutes.
At a previous job years ago, we had a service that was essentially a file server for something like 50TB of tiny files. We only backed it up once a week because just _walking_ the whole filesystem with something like
dutook more than a day. Yes, we should have simply thrown money at the problem and just bought the right solution from an enterprise storage vendor or dumped them all into S3. Unfortunately, these were not options. Blame management.A close second would have been to rearchitect dependent services to speak S3 instead of a bespoke REST-ish API, deploy something like SeaweedFS, and call it a day. SeaweedFS handles lots of small files gracefully because it doesn't just naively store one object per file on the filesystem like most locally-hosted S3 solutions (including Versity) do. And we'd get replication/redundancy on top of it. Unfortunately, I didn't get buy-in from the other teams maintaining the dependent services ("sorry, we don't have time to refactor our code, guess that makes it a 'you' problem").
What I did instead was descend into madness. Instead of writing each file to disk, all new files were written to a "cache" directory which matched the original filesystem layout of the server. And then every hour, that directory was tarred up and archived. When a read was required, the code would check the cache first. If the file wasn't there, it would figure out which tarball was needed and extract the file from there instead. This only worked because all files had a timestamp embedded in the path. Read performance sucked, but that didn't matter because reads were very rare. But the data absolutely had to be there when needed.
Most importantly, backups took less than an hour for the first time in years.
> Our S3 API is now served by Versity S3 Gateway and backed by a plain simple Btrfs filesystem.
With apologies to the SRE Book ("hope is not a strategy")... Btrfs is not a strategy.
> The costs have increased: renting an additional dedicated server costs more than storing ~100GB at a managed object storage service. But the improved performance and reliability are worth it.
Were your users complaining about reliability and performance? If it cost more, adds more work (backup/restore management), and the users aren't happier then why make the change in the first place?
On a separate note, what tool is the final benchmark screenshot form?
versity does not include any erasure coding or replication...
I mean, I appreciate the openness about the scale, but for context, my home's personal backup managed via restic to S3 is 370GB. Fewer objects, but still, we're not talking a big install here.
This is pretty much like that story of, if it fits on your laptop, it's not big data.