> This feels like it should have been mentioned in the article.
With an entire section complaining how many lines of code existing implementations are, looks like they did found a good simple implementation to clone in Rust then deliberately not mention it.
Just like that author, many years ago, I went through the process of understanding the DEFLATE compression standard and producing a short and concise decompressor for gzip+DEFLATE. Here are the resources I published as a result of that exploration:
Put me in mind of one of my early experiments in Rust. It would be interesting to compare a iterator based form that just called .take(need)
I haven't written a lot of Rust, but one thing I did was to write an iterator that took an iterator of bytes as input and provided bits as output. Then used an iterator that gave bytes from a block of memory.
It was mostly as a test to see how much high level abstraction left an imprint on the compiled code.
The dissasembly showed it pulling in 32 bits at a time and shifting out the bits pretty much the same way I would have written in ASM.
I was quite impressed. Although I tested it was working by counting the bits and someone critizised it for not using popcount, so I guess you can't have everything.
I had a similar experience implementing simd instructions in my emulator, where I needed to break apart a 64-bit value into four eight-bit values, do an operation on each value, then pack it back together. My first implementation did it with all the bit shifts you’d expect, but my second one used two helpers to unpack into an array, map on the array to a second array, and pack the array again. The optimized output was basically the same.
>the only flag we care about is FNAME
The specification does not define an encoding for the file name. Different file systems may impose restrictions on certain names, so FNAME should not be used.
that reminds me a zip file creator in a few lines of JS. Now that CompressionStream is a built in feature of the browser and node. No need to use some bloated npm lib. But momentum and popularity (and LLMs) will keep people using JSZip for eternity
Another dev who doesn’t show respect to what has been done and expect a particular language will do wonders for him. Also I don’t see this is much better in term of readability.
45 comments
> so i wrote a gzip decompressor from scratch
After skimming through the author's Rust code, it appears to be a fairly straightforward port of puff.c (included in the zlib source): https://github.com/madler/zlib/blob/develop/contrib/puff/puf...
It makes me wonder if there was some LLM help, based on how similar the fn structure and identifier names are.
> It makes me wonder if there was some LLM help
I would bet there was
> This feels like it should have been mentioned in the article.
With an entire section complaining how many lines of code existing implementations are, looks like they did found a good simple implementation to clone in Rust then deliberately not mention it.
https://github.com/madler/infgen
EDIT: Didnt notice that it’s even by the same person of course it’s very similar
* https://www.nayuki.io/page/deflate-specification-v1-3-html
* https://www.nayuki.io/page/simple-deflate-decompressor
* https://github.com/nayuki/Simple-DEFLATE-decompressor
I haven't written a lot of Rust, but one thing I did was to write an iterator that took an iterator of bytes as input and provided bits as output. Then used an iterator that gave bytes from a block of memory.
It was mostly as a test to see how much high level abstraction left an imprint on the compiled code.
The dissasembly showed it pulling in 32 bits at a time and shifting out the bits pretty much the same way I would have written in ASM.
I was quite impressed. Although I tested it was working by counting the bits and someone critizised it for not using popcount, so I guess you can't have everything.
PSA: Rust exposes the popcnt intrinsic via the
count_onesmethod on integer types: https://doc.rust-lang.org/std/primitive.u32.html#method.coun...> twenty five thousand lines of pure C not counting CMake files. ...
Keep in mind this is also 31 years of cruft and lord knows what.
Plan 9 gzip is 738 lines total:
Even the zipfs file server that mounts zip files as file systems is 391 lines.edit - post a link to said code: https://github.com/9front/9front/tree/front/sys/src/cmd/gzip
> ... (and whenever working with C always keep in mind that C stands for CVE).
Sigh.
>the only flag we care about is FNAME The specification does not define an encoding for the file name. Different file systems may impose restrictions on certain names, so FNAME should not be used.