Gzip decompression in 250 lines of Rust (iev.ee)

by vismit2000 45 comments 127 points
Read article View on HN

45 comments

[−] stgn 49d ago

> so i wrote a gzip decompressor from scratch

After skimming through the author's Rust code, it appears to be a fairly straightforward port of puff.c (included in the zlib source): https://github.com/madler/zlib/blob/develop/contrib/puff/puf...

[−] dymk 49d ago
This feels like it should have been mentioned in the article.

It makes me wonder if there was some LLM help, based on how similar the fn structure and identifier names are.

[−] cmovq 49d ago
Even the function names are identical :/
[−] bitbasher 49d ago
You could say it was a “puff” piece, eh, eh!?
[−] ieviev 49d ago
Oh i didnt even know of this. But i got a lot of help from this one

https://github.com/madler/infgen

EDIT: Didnt notice that it’s even by the same person of course it’s very similar

[−] pseudohadamard 49d ago
Meh, I've written a gzip decompressor in two lines of shell script:

  #!/bin/sh
  gzip -d $@
[−] nayuki 50d ago
Just like that author, many years ago, I went through the process of understanding the DEFLATE compression standard and producing a short and concise decompressor for gzip+DEFLATE. Here are the resources I published as a result of that exploration:

* https://www.nayuki.io/page/deflate-specification-v1-3-html

* https://www.nayuki.io/page/simple-deflate-decompressor

* https://github.com/nayuki/Simple-DEFLATE-decompressor

[−] Lerc 49d ago
The function

  fn bits(&mut self, need: i32) -> i32 { ....
Put me in mind of one of my early experiments in Rust. It would be interesting to compare a iterator based form that just called .take(need)

I haven't written a lot of Rust, but one thing I did was to write an iterator that took an iterator of bytes as input and provided bits as output. Then used an iterator that gave bytes from a block of memory.

It was mostly as a test to see how much high level abstraction left an imprint on the compiled code.

The dissasembly showed it pulling in 32 bits at a time and shifting out the bits pretty much the same way I would have written in ASM.

I was quite impressed. Although I tested it was working by counting the bits and someone critizised it for not using popcount, so I guess you can't have everything.

[−] MisterTea 50d ago

> twenty five thousand lines of pure C not counting CMake files. ...

Keep in mind this is also 31 years of cruft and lord knows what.

Plan 9 gzip is 738 lines total:

  gzip.c 217 lines
  gzip.h 40 lines
  zip.c  398 lines
  zip.h  83 lines
Even the zipfs file server that mounts zip files as file systems is 391 lines.

edit - post a link to said code: https://github.com/9front/9front/tree/front/sys/src/cmd/gzip

> ... (and whenever working with C always keep in mind that C stands for CVE).

Sigh.

[−] jmmv 49d ago
I was reading this and couldn't stop thinking https://en.wikipedia.org/wiki/Literate_programming
[−] carlos256 49d ago

>the only flag we care about is FNAME The specification does not define an encoding for the file name. Different file systems may impose restrictions on certain names, so FNAME should not be used.

[−] socalgal2 49d ago
that reminds me a zip file creator in a few lines of JS. Now that CompressionStream is a built in feature of the browser and node. No need to use some bloated npm lib. But momentum and popularity (and LLMs) will keep people using JSZip for eternity
[−] sulplisetalk 49d ago
[flagged]
[−] jeffrallen 50d ago
[flagged]
[−] up2isomorphism 50d ago
Another dev who doesn’t show respect to what has been done and expect a particular language will do wonders for him. Also I don’t see this is much better in term of readability.