This is essentially S3FS using EFS (AWS's managed NFS service) as a cache layer for active data and small random accesses. Unfortunately, this also means that it comes with some of EFS's eye-watering pricing:
— All writes cost $0.06/GB, since everything is first written to the EFS cache. For write-heavy applications, this could be a dealbreaker.
— Reads hitting the cache get billed at $0.03/GB. Large reads (>128kB) get directly streamed from the underlying S3 bucket, which is free.
— Cache is charged at $0.30/GB/month. Even though everything is written to the cache (for consistency purposes), it seems like it's only used for persistent storage of small files (<128kB), so this shouldn't cost too much.
Thanks for the analysis. Interestingly when we first released our low latency s3-compatible storage (1M IOPS, p99 ~5ms)[1], a lot of people asking the same questions why we tried to bring file system semantics (atomic object/folder rename) to s3. We also got some feedback from people who really need FS sematics, and added POSIX FS support then.
aws S3FS is using normal FUSE interface, which would be super heavy due to inherent overhead of copying data back and forth between user space and kernel space, that is the initial concern when we tried to add the POSIX support for the original object storage design. Fortunately, we have found and open-sourced a perfect solution [2]: using FUSE_OVER_IO_URING + FUSE_PASSTHROUGH, we can maintain the same high-performance archtecture design of our original object storage. We'd like to come out a new blog post explain more details and reveal our performance numbers if anyone is interested with this.
This was my concern too. The whole point of using S3 as a file system instead of EBS / EFS (for me at least) is to minimize cost and I don't really see why I would use this instead of s3fs.
One advantage over S3FS would be that multiple filesystem mounts would see a consistent view of the filesystem, but it looks like this advantage disappears when mixing direct bucket access with filesystem mounts. Given the famously slow small file performance of EFS it might have been better (and cheaper) to send all files to S3 and only use EFS for the metadata layer. Not having atomic rename is also going to be a problem for any use that expects a regular filesystem.
S3 Files was launched today without support for atomic rename. This is not something you can bolt on. Can you imagine running Claude Code on your S3 Files and it just wants to do a little house cleaning, renaming a directory and suddenly a full copy is needed for every file in that directory?
The hardest part in building a distributed filesystem is atomic rename. It's always rename. Scalable metadata file systems, like Collosus/Tectonic/ADLSv2/HopsFS, are either designed around how to make rename work at scale* or how work around it at higher levels in the stack.
> we locked a bunch of our most senior engineers in a room and said we weren’t going to let them out till they had a plan that they all liked.
That's one way to do it.
> When you create or modify files, changes are aggregated and committed back to S3 roughly every 60 seconds as a single PUT. Sync runs in both directions, so when other applications modify objects in the bucket, S3 Files automatically spots those modifications and reflects them in the filesystem view automatically.
That sounds about right given the above. I have trouble seeing this as something other than a giant "hack." I already don't enjoy projecting costs for new types of S3 access patterns and I feel like has the potential to double the complication I already experience here.
Maybe I'm too frugal, but I've been in the cloud for a decade now, and I've worked very hard to prevent any "surprise" bills from showing up. This seems like a great feature; if you don't care what your AWS bill is each month.
> For example, suppose you edit /mnt/s3files/report.csv through the file system. Before S3 Files synchronizes your changes back to the S3 bucket, another application uploads a new version of report.csv directly to the S3 bucket. When S3 Files detects the conflict, it moves your version of report.csv to the lost and found directory and replaces it with the version from the S3 bucket.
> The lost and found directory is located in your file system's root directory under the name .s3files-lost+found-file-system-id.
I wish they offered some managed bridging to local NVMe storage. AWS NVMe is super fast compared to EBS, and EBS (node-exclusive access as block device) is faster than EFS (multi-node access). I imagine this can go fast if you put some kind of further-cache-to-NVMe FS on top, but a completely vertically integrated option would be much better.
The problem with using S3 as a filesystem is that it’s immutable, and that hasn’t changed with S3 Files. So if I have a large file and change 1 byte of it, or even just rename it, it needs to upload the entire file all over again. This seems most useful for read-heavy workflows of files that are small enough to fit in the cache.
The interesting part isn't the file abstraction itself, it's that this pushes the dividing line between "object store" and "filesystem" another notch toward filesystem. The absence of in-place updates was always the load-bearing wall keeping S3 cheap and durable in the way it is — if Files preserves that and only makes the read API friendlier, fine.
If they ever ship in-place writes I'd want to see what happens to the consistency model and pricing first. That's where the actual simplicity lived, not in the API surface. Half the appeal of S3 over a real filesystem was that you couldn't shoot yourself in the foot with partial overwrites.
this post will probably never be read but.. I was on the team that was trying to make the marriage of S3 and EFS work a year ago. it's a pretty hard problem. At one point we proposed this solution (which seems like a caching layer) but it got shot down for a more complex system that would have attempted to rebuild EFS on faster S3 blob storage. I left before this engineering monstrosity made significant progress, and it clearly died at some point.
Looks like they went back to a simpler solution they could deliver but with some obvious warts. good to see something get launched but the sausage making her was brutal.
The reality is that if you read https://www.allthingsdistributed.com/2026/04/s3-files-and-th..., it sounds like the great minds at S3 figured out that a caching layer was the way to go. We (EFS) fucking proposed that years ago. But we had to deal with Seattle and the S3 braintrust who didn't want to do that. I know we wrote a PRFAQ that was close to this concept probably four years ago. The political story is that EFS was taking over by S3 and the EFS folks didn't have the agency or political backing to build a more workable solution. So we wasted a shit ton of time tackling something that was never going to work and many of the tenured EFS engineers left.
I was prototyping with S3 mounted as filesystem for docker volumes and evaluating solutions for that. GeeseFS cli is the fastest one, here I made a script to mount folder with it from S3 compatible storage:
Does anyone have solutions or suggestions for mounting a S3 bucket as a read-only filesystem? I don't need any writes.
Previously I have done a periodic script that would simply re-sync the directory which works well enough. But curious if there's anything else out there.
This is a good alternative for those who wants storing petabytes of historical logs, metrics or traces in VictoriaLogs, VictoriaMetrics and VictoriaTraces, and wants saving 2x-4x on the persistent storage pricing (compare EBS pricing to S3 pricing).
Eagerly awaiting on first blogpost where developers didn't read the eventually consistent part, lost the data and made some "genius" workaround with help of the LLM that got them in that spot in the first place
This could be useful. We use EFS, I like the benefits but I think it’s overkill for what we need. I’ve been thinking of switching to s3 but not looking forward to completely changing how we upload and download.
119 comments
— All writes cost $0.06/GB, since everything is first written to the EFS cache. For write-heavy applications, this could be a dealbreaker.
— Reads hitting the cache get billed at $0.03/GB. Large reads (>128kB) get directly streamed from the underlying S3 bucket, which is free.
— Cache is charged at $0.30/GB/month. Even though everything is written to the cache (for consistency purposes), it seems like it's only used for persistent storage of small files (<128kB), so this shouldn't cost too much.
aws S3FS is using normal FUSE interface, which would be super heavy due to inherent overhead of copying data back and forth between user space and kernel space, that is the initial concern when we tried to add the POSIX support for the original object storage design. Fortunately, we have found and open-sourced a perfect solution [2]: using FUSE_OVER_IO_URING + FUSE_PASSTHROUGH, we can maintain the same high-performance archtecture design of our original object storage. We'd like to come out a new blog post explain more details and reveal our performance numbers if anyone is interested with this.
[1] https://fractalbits.com/blog/why-we-built-another-object-sto...
[2] https://crates.io/crates/fractal-fuse
> Large reads (>128kB) get directly streamed from the underlying S3 bucket, which is free.
Always uncached? S3 has pretty bad latency.
> directly streamed from the underlying S3 bucket, which is free.
No reads from S3 are free. All outgoing traffic from AWS is charged no matter what.
The hardest part in building a distributed filesystem is atomic rename. It's always rename. Scalable metadata file systems, like Collosus/Tectonic/ADLSv2/HopsFS, are either designed around how to make rename work at scale* or how work around it at higher levels in the stack.
* https://www.hopsworks.ai/post/scalable-metadata-the-new-bree...
> we locked a bunch of our most senior engineers in a room and said we weren’t going to let them out till they had a plan that they all liked.
That's one way to do it.
> When you create or modify files, changes are aggregated and committed back to S3 roughly every 60 seconds as a single PUT. Sync runs in both directions, so when other applications modify objects in the bucket, S3 Files automatically spots those modifications and reflects them in the filesystem view automatically.
That sounds about right given the above. I have trouble seeing this as something other than a giant "hack." I already don't enjoy projecting costs for new types of S3 access patterns and I feel like has the potential to double the complication I already experience here.
Maybe I'm too frugal, but I've been in the cloud for a decade now, and I've worked very hard to prevent any "surprise" bills from showing up. This seems like a great feature; if you don't care what your AWS bill is each month.
You can write into one and read out from the other and vice versa. Consistency guarantees kept within each but not between.
> For example, suppose you edit /mnt/s3files/report.csv through the file system. Before S3 Files synchronizes your changes back to the S3 bucket, another application uploads a new version of report.csv directly to the S3 bucket. When S3 Files detects the conflict, it moves your version of report.csv to the lost and found directory and replaces it with the version from the S3 bucket.
> The lost and found directory is located in your file system's root directory under the name .s3files-lost+found-file-system-id.
If they ever ship in-place writes I'd want to see what happens to the consistency model and pricing first. That's where the actual simplicity lived, not in the API surface. Half the appeal of S3 over a real filesystem was that you couldn't shoot yourself in the foot with partial overwrites.
Built in cache, CDN compatible, JSON metadata, concurrency safe and it targets all S3 compatible storage systems.
Looks like they went back to a simpler solution they could deliver but with some obvious warts. good to see something get launched but the sausage making her was brutal.
The reality is that if you read https://www.allthingsdistributed.com/2026/04/s3-files-and-th..., it sounds like the great minds at S3 figured out that a caching layer was the way to go. We (EFS) fucking proposed that years ago. But we had to deal with Seattle and the S3 braintrust who didn't want to do that. I know we wrote a PRFAQ that was close to this concept probably four years ago. The political story is that EFS was taking over by S3 and the EFS folks didn't have the agency or political backing to build a more workable solution. So we wasted a shit ton of time tackling something that was never going to work and many of the tenured EFS engineers left.
https://gist.github.com/huksley/44341276d7c269f092e10784959e...
You might want to play with memory params for GeeseFS for better results
> changes are aggregated and committed back to S3 roughly every 60 seconds as a single PUT
Single PUT per file I assume?
My guess is this would only enable a read-replica and not backups as Litestream currently does?
Previously I have done a periodic script that would simply re-sync the directory which works well enough. But curious if there's anything else out there.
we run datalakes using DuckLake and this sounds really useful. GCP should follow suit quickly.
Sell the benefits.
I have around 9 TB in 21m files on S3. How does this change benefit me?