S3 Files | swenrekcah

[−] MontyCarloHall 38d ago

This is essentially S3FS using EFS (AWS's managed NFS service) as a cache layer for active data and small random accesses. Unfortunately, this also means that it comes with some of EFS's eye-watering pricing:

— All writes cost $0.06/GB, since everything is first written to the EFS cache. For write-heavy applications, this could be a dealbreaker.

— Reads hitting the cache get billed at $0.03/GB. Large reads (>128kB) get directly streamed from the underlying S3 bucket, which is free.

— Cache is charged at $0.30/GB/month. Even though everything is written to the cache (for consistency purposes), it seems like it's only used for persistent storage of small files (<128kB), so this shouldn't cost too much.

[−] thomas_fa 38d ago

Thanks for the analysis. Interestingly when we first released our low latency s3-compatible storage (1M IOPS, p99 ~5ms)[1], a lot of people asking the same questions why we tried to bring file system semantics (atomic object/folder rename) to s3. We also got some feedback from people who really need FS sematics, and added POSIX FS support then.

aws S3FS is using normal FUSE interface, which would be super heavy due to inherent overhead of copying data back and forth between user space and kernel space, that is the initial concern when we tried to add the POSIX support for the original object storage design. Fortunately, we have found and open-sourced a perfect solution [2]: using FUSE_OVER_IO_URING + FUSE_PASSTHROUGH, we can maintain the same high-performance archtecture design of our original object storage. We'd like to come out a new blog post explain more details and reveal our performance numbers if anyone is interested with this.

[1] https://fractalbits.com/blog/why-we-built-another-object-sto...

[2] https://crates.io/crates/fractal-fuse

[−] ktimespi 38d ago

This was my concern too. The whole point of using S3 as a file system instead of EBS / EFS (for me at least) is to minimize cost and I don't really see why I would use this instead of s3fs.

[−] avereveard 37d ago

Probably some tradeoff at high client count or if you seek into files to read partial data

[−] ktimespi 37d ago

s3fs can do partial reads too with range queries, I'm leaning more towards the tradeoff.

[−] the8472 38d ago

> Large reads (>128kB) get directly streamed from the underlying S3 bucket, which is free.

Always uncached? S3 has pretty bad latency.

[−] objectivefs 37d ago

One advantage over S3FS would be that multiple filesystem mounts would see a consistent view of the filesystem, but it looks like this advantage disappears when mixing direct bucket access with filesystem mounts. Given the famously slow small file performance of EFS it might have been better (and cheaper) to send all files to S3 and only use EFS for the metadata layer. Not having atomic rename is also going to be a problem for any use that expects a regular filesystem.

[−] deepsun 38d ago

> directly streamed from the underlying S3 bucket, which is free.

No reads from S3 are free. All outgoing traffic from AWS is charged no matter what.

[−] jamesblonde 37d ago

S3 Files was launched today without support for atomic rename. This is not something you can bolt on. Can you imagine running Claude Code on your S3 Files and it just wants to do a little house cleaning, renaming a directory and suddenly a full copy is needed for every file in that directory?

The hardest part in building a distributed filesystem is atomic rename. It's always rename. Scalable metadata file systems, like Collosus/Tectonic/ADLSv2/HopsFS, are either designed around how to make rename work at scale* or how work around it at higher levels in the stack.

* https://www.hopsworks.ai/post/scalable-metadata-the-new-bree...

[−] wbl 38d ago

"NFS provides the semantics your applications expect" is one of the funniest things I have ever read.

[−] themafia 38d ago

> we locked a bunch of our most senior engineers in a room and said we weren’t going to let them out till they had a plan that they all liked.

That's one way to do it.

> When you create or modify files, changes are aggregated and committed back to S3 roughly every 60 seconds as a single PUT. Sync runs in both directions, so when other applications modify objects in the bucket, S3 Files automatically spots those modifications and reflects them in the filesystem view automatically.

That sounds about right given the above. I have trouble seeing this as something other than a giant "hack." I already don't enjoy projecting costs for new types of S3 access patterns and I feel like has the potential to double the complication I already experience here.

Maybe I'm too frugal, but I've been in the cloud for a decade now, and I've worked very hard to prevent any "surprise" bills from showing up. This seems like a great feature; if you don't care what your AWS bill is each month.

[−] everfrustrated 38d ago

The best way to think of the architecture of this is it's EFS with a bidirectional sync to S3.

You can write into one and read out from the other and vice versa. Consistency guarantees kept within each but not between.

[−] rdtsc 38d ago

Synchronization bits is what I was wondering about: https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-fil...

> For example, suppose you edit /mnt/s3files/report.csv through the file system. Before S3 Files synchronizes your changes back to the S3 bucket, another application uploads a new version of report.csv directly to the S3 bucket. When S3 Files detects the conflict, it moves your version of report.csv to the lost and found directory and replaces it with the version from the S3 bucket.

> The lost and found directory is located in your file system's root directory under the name .s3files-lost+found-file-system-id.

[−] abidlabs 38d ago

Hugging Face Buckets also recently added support for mounting Buckets as a filesystem: https://huggingface.co/changelog/hf-mount

[−] jitl 38d ago

I wish they offered some managed bridging to local NVMe storage. AWS NVMe is super fast compared to EBS, and EBS (node-exclusive access as block device) is faster than EFS (multi-node access). I imagine this can go fast if you put some kind of further-cache-to-NVMe FS on top, but a completely vertically integrated option would be much better.

[−] dabinat 38d ago

The problem with using S3 as a filesystem is that it’s immutable, and that hasn’t changed with S3 Files. So if I have a large file and change 1 byte of it, or even just rename it, it needs to upload the entire file all over again. This seems most useful for read-heavy workflows of files that are small enough to fit in the cache.

[−] koolba 38d ago

If you though locking semantics over NFS were wonky, just wait till we through a remote S3 backend in the mix!

[−] tracerbits 37d ago

The interesting part isn't the file abstraction itself, it's that this pushes the dividing line between "object store" and "filesystem" another notch toward filesystem. The absence of in-place updates was always the load-bearing wall keeping S3 cheap and durable in the way it is — if Files preserves that and only makes the read API friendlier, fine.

If they ever ship in-place writes I'd want to see what happens to the consistency model and pricing first. That's where the actual simplicity lived, not in the API surface. Half the appeal of S3 over a real filesystem was that you couldn't shoot yourself in the foot with partial overwrites.

[−] nyc_pizzadev 38d ago

This is very close to its first official release: https://fiberfs.io/

Built in cache, CDN compatible, JSON metadata, concurrency safe and it targets all S3 compatible storage systems.

[−] harshaw 37d ago

this post will probably never be read but.. I was on the team that was trying to make the marriage of S3 and EFS work a year ago. it's a pretty hard problem. At one point we proposed this solution (which seems like a caching layer) but it got shot down for a more complex system that would have attempted to rebuild EFS on faster S3 blob storage. I left before this engineering monstrosity made significant progress, and it clearly died at some point.

Looks like they went back to a simpler solution they could deliver but with some obvious warts. good to see something get launched but the sausage making her was brutal.

The reality is that if you read https://www.allthingsdistributed.com/2026/04/s3-files-and-th..., it sounds like the great minds at S3 figured out that a caching layer was the way to go. We (EFS) fucking proposed that years ago. But we had to deal with Seattle and the S3 braintrust who didn't want to do that. I know we wrote a PRFAQ that was close to this concept probably four years ago. The political story is that EFS was taking over by S3 and the EFS folks didn't have the agency or political backing to build a more workable solution. So we wasted a shit ton of time tackling something that was never going to work and many of the tenured EFS engineers left.

[−] gonzalohm 38d ago

I cannot 100% confirm this, but I believe AWS insisted a lot in NOT using S3 as a file system. Why the change now?

[−] huksley 37d ago

I was prototyping with S3 mounted as filesystem for docker volumes and evaluating solutions for that. GeeseFS cli is the fastest one, here I made a script to mount folder with it from S3 compatible storage:

https://gist.github.com/huksley/44341276d7c269f092e10784959e...

You might want to play with memory params for GeeseFS for better results

[−] znpy 38d ago

As usual, everything except pricing is very well explained.

[−] nvartolomei 38d ago

> changes are aggregated and committed back to S3 roughly every 60 seconds as a single PUT

Single PUT per file I assume?

[−] curt15 38d ago

How does this compare with ZFS's object storage backend? https://news.ycombinator.com/item?id=46620673

[−] miguel_martin 38d ago

Dumb Q: what would happen if you used this to store a SQLite database? Would it just... work?

My guess is this would only enable a read-replica and not backups as Litestream currently does?

[−] mgaunard 38d ago

Zero mention of s3fs which already did this for decades.

[−] borplk 37d ago

Does anyone have solutions or suggestions for mounting a S3 bucket as a read-only filesystem? I don't need any writes.

Previously I have done a periodic script that would simply re-sync the directory which works well enough. But curious if there's anything else out there.

[−] valyala 37d ago

This is a good alternative for those who wants storing petabytes of historical logs, metrics or traces in VictoriaLogs, VictoriaMetrics and VictoriaTraces, and wants saving 2x-4x on the persistent storage pricing (compare EBS pricing to S3 pricing).

[−] PunchyHamster 38d ago

Eagerly awaiting on first blogpost where developers didn't read the eventually consistent part, lost the data and made some "genius" workaround with help of the LLM that got them in that spot in the first place

[−] hk1337 38d ago

This could be useful. We use EFS, I like the benefits but I think it’s overkill for what we need. I’ve been thinking of switching to s3 but not looking forward to completely changing how we upload and download.

[−] up2isomorphism 38d ago

This why today’s sales pitch are often disguised as a tech blog.

[−] dang 38d ago

Since this is the thread that got attention, I've added the announcement link to the toptext and made the title work for both.

[−] mbana 38d ago

Werner Vogels is awesome. I first discovered about his writing when I learnt about Dynamo DB.

[−] goekjclo 38d ago

the "under the hood uses EFS" part is the most interesting bit here

[−] thelastgallon 37d ago

So, NFS in the cloud?

[−] gervwyk 38d ago

any recommendations for a lambda based sftp sever setup?

[−] Centigonal 38d ago

Terrible day for people who sloppily use filesystem vocabulary when referring to S3 objects and prefixes.

[−] mockbolt 37d ago

One of the best

[−] tao_oat 37d ago

[−] minutesmith 38d ago

[flagged]

[−] devnotes77 38d ago

[dead]

[−] ovaistariq 38d ago

TLDR: EFS as a eventually consistent cache in front of S3.

[−] mritchie712 38d ago

tldr: this caches your S3 data in EFS.

we run datalakes using DuckLake and this sounds really useful. GCP should follow suit quickly.

[−] DenisM 38d ago

TLDR: Eventually consistent file system view on top of s3 with read/write cache.

[−] CrzyLngPwd 38d ago

If there is ever a post that needs a TLDR or an AI summary it is that one.

Sell the benefits.

I have around 9 TB in 21m files on S3. How does this change benefit me?

S3 Files (allthingsdistributed.com)

119 comments