The DCT is a cool primitive. By extracting the low frequency coefficients, you can get a compact blurry representation of an image. This is used by preload thumbnail algorithms like blurhash and thumbhash. It's also used by some image watermarking techniques to target changes to a detail level that will be less affected by scaling or re-encoding.
Whenever I see those 'blocky' artifacts on a low-quality image, I used to just think of it as 'bad tech.' After reading this, it's cool to realize you're actually seeing the 8x8 DCT grid itself. You're literally seeing the math break down because there wasn't enough bit-budget to describe those high-frequency sine waves. It’s like looking at the brushstrokes on a digital painting.
The blurred version feels honest -- it's not showing you anything more than what has been encoded.
The sharp image feels confusing -- it's showing you a ton of detail that is totally wrong. "Detail" that wasn't in the original, but is just artifacts.
Why would you prefer distracting artifacts over a blurred version?
Get a picture of grass, save it as a JPEG at 15% quality... It still looks like grass. Then run it through jpeg2png... The output looks like a green smear. You might not even be able to tell that it's supposed to be grass. jpeg2png just blurs the hell out of images.
Also if your software for whatever reasons is using the original libjpeg in its modern (post classic version 6b) incarnation [1], right from version 7 onwards the new (and still current) maintainer switched the algorithm for chroma up-/downsampling from classic pixel interpolation to DCT-based scaling, claiming it's mathematically more beautiful and (apart from the unavoidable information loss on the first downscaling) perfectly reversible [2].
The problem with that approach however is that DCT-scaling is block-based, so for classic 4:2:0 subsampling, each 16x16 chroma block in the original image is now individually being downscaled to 8x8, and perhaps more importantly, later-on individually being upscaled back to 16x16 on decompression.
Compared to classic image resizing algorithms (bilinear scaling or whatever), this block-based upscaling can and does introduce additional visual artefacts at the block boundaries, which, while somewhat subtle, are still large enough to be actually borderline visible even when not quite pixel-peeping. ([3] notes that the visual differences between libjpeg 6b/turbo and libjpeg 7-9 on image decompression are indeed of a borderline visible magnitude.)
I stumbled across this detail after having finally upgraded my image editing software [4] from the old freebie version I'd been using for years (it was included with a computer magazine at some point) to its current incarnation, which came with a libjpeg version upgrade under the hood. Not long afterwards I noticed that for quite a few images, the new version introduced some additional blockiness when decoding JPEG images (also subsequently exacerbated by some particular post-processing steps I was doing on those images), and then I somehow stumbled across this article [3] which noted the change in chroma subsampling and provided the crucial clue to this riddle.
Thankfully, the developers of that image editor were (still are) very friendly and responsive and actually agreed to switch out the jpeg library to libjpeg-turbo, thereby resolving that issue. Likewise, luckily few other programs and operating systems seem to actually use modern libjpeg, usually preferring libjpeg-turbo or something else that continues using regular image scaling algorithms for chroma subsampling.
[1] Instead of libjpeg-turbo or whatever else is around these days.
[2] Which might be true in theory, but I tried de- and recompressing images in a loop with both libjpeg 6b and 9e, and didn't find a significant difference in the number of iterations required until the image converged to a stable compression result.
It's a perfectly pragmatic engineering choice. Blocking is visible only when the compression is too heavy. When degradation is imperceptible, then the block edges are imperceptible too, and the problem doesn't need to be solved (in JPEG imperceptible still means 10:1 data size reduction).
Later compression algorithms were focused on video, where the aim was to have good-enough low-quality approximations.
Deblocking is an inelegant hack.
Deblocking hurts high quality compression of still images, because it makes it harder for codecs to precisely reproduce the original image. Blurring removes details that the blocks produced, so the codec has to either disable deblocking or compensate with exaggerated contrast (which is still an approximation). It also adds a dependency across blocks, which complicates the problem from independent per-block computation to finding a global optimum that happens to flip between frequency domain and pixel hacks. It's no longer a neat mathematical transform with a closed-form solution, but a pile of iterative guesswork (or just not taken into account at all, and the codec wins benchmarks on PSNR, looks good in side by side comparisons at 10% quality level, but is an auto-airbrushing texture-destroying annoyance when used for real images).
The Daala project tried to reinvent it with better mathematical foundations (lapped transforms), but in the end a post-processing pass of blurring the pixels has won.
Having played a bit with Discrete FFT (with FFTW on 2D images in a Shake plugin we made at work ages ago) makes the DCT coefficients make so much more sense! I really wonder whether the frequency-decomposition could happen at multiple scale levels though? Sounds slightly like wavelets and maybe that's how jpeg2000 works?.. Yeah I looked it up, uses DWT so it kinda sounds like it! Shame it hasn't taken off so far!? Or maybe there's an even better way?
I've been working on a pure Ruby JPEG encoder and a bug led me to an effect I wanted. The output looked just like the "crunchy" JPEGs my 2000-era Kodak digital camera used to put out, but it turns out the encoder wasn't following the zig-zag pattern properly but just going in raster order. I'm now on a quest to figure out if some early digital cameras had similar encoding bugs because their JPEG output was often horrendous compared to what you'd expect for the filesize.
The part about green getting 58.7% weight in the luminance calculation is one of those details that seems arbitrary until you realize it's literally modeled on the density of cone cells in the human retina. The whole algorithm is basically a map of what human eyes can't see.
What would happen if the Cr and Cb channels used different chroma subsampling patterns? E.g. Cr would use the 4:2:0 pattern, and Cb would use the 4:1:1 pattern.
I've seen many a JPEG explainer, but this one wins for most aesthetic. The interactive visuals were also nice. My only criticism is the abrupt ending; should have concluded with the "now lets put it all together" slider.
This is a really great article, and I really appreciate how it explains the different parts of how JPEG works with so much clarity and interactive visualizations.
However, I do have to give one bit of critique: it also makes my laptop fans spin like crazy even when nothing is happening at all.
Now, this is not intended as a critique of the author. I'm assuming that she used some framework to get the results out quickly, and that there is a bug in how that framework handles events and reactivity. But it would still be nice if whatever causes this issue could be fixed. It would be sad if the website had the same issue on mobile and caused my phone battery to drain quickly when 90% of the time is spent reading text and watching graphics that don't change.
Really enjoyed this. It's easy to forget how much engineering went into JPEG. The explanation of compression and quality tradeoffs was clear without oversimplifying. Impressive how well the format still holds up today. Curious how you think it compares to newer formats like AVIF or WebP in everyday use.
Maybe it's because I've read a few pieces on JPEG before so I have some prior knowledge, but I was looking to review this and this presentation was one of the clearest I've seen. Good job!
125 comments
I made a notebook a few years back which lets you play with / filter the DCT coefficients of an image: https://observablehq.com/d/167d8f3368a6d602
This tool uses more clever math to replace what's missing: https://github.com/victorvde/jpeg2png
You're not seeing the actual details either way.
The blurred version feels honest -- it's not showing you anything more than what has been encoded.
The sharp image feels confusing -- it's showing you a ton of detail that is totally wrong. "Detail" that wasn't in the original, but is just artifacts.
Why would you prefer distracting artifacts over a blurred version?
Get a picture of grass, save it as a JPEG at 15% quality... It still looks like grass. Then run it through jpeg2png... The output looks like a green smear. You might not even be able to tell that it's supposed to be grass. jpeg2png just blurs the hell out of images.
Here's a side-by-side: https://ibb.co/99C0F34d
The problem with that approach however is that DCT-scaling is block-based, so for classic 4:2:0 subsampling, each 16x16 chroma block in the original image is now individually being downscaled to 8x8, and perhaps more importantly, later-on individually being upscaled back to 16x16 on decompression.
Compared to classic image resizing algorithms (bilinear scaling or whatever), this block-based upscaling can and does introduce additional visual artefacts at the block boundaries, which, while somewhat subtle, are still large enough to be actually borderline visible even when not quite pixel-peeping. ([3] notes that the visual differences between libjpeg 6b/turbo and libjpeg 7-9 on image decompression are indeed of a borderline visible magnitude.)
I stumbled across this detail after having finally upgraded my image editing software [4] from the old freebie version I'd been using for years (it was included with a computer magazine at some point) to its current incarnation, which came with a libjpeg version upgrade under the hood. Not long afterwards I noticed that for quite a few images, the new version introduced some additional blockiness when decoding JPEG images (also subsequently exacerbated by some particular post-processing steps I was doing on those images), and then I somehow stumbled across this article [3] which noted the change in chroma subsampling and provided the crucial clue to this riddle.
Thankfully, the developers of that image editor were (still are) very friendly and responsive and actually agreed to switch out the jpeg library to libjpeg-turbo, thereby resolving that issue. Likewise, luckily few other programs and operating systems seem to actually use modern libjpeg, usually preferring libjpeg-turbo or something else that continues using regular image scaling algorithms for chroma subsampling.
[1] Instead of libjpeg-turbo or whatever else is around these days.
[2] Which might be true in theory, but I tried de- and recompressing images in a loop with both libjpeg 6b and 9e, and didn't find a significant difference in the number of iterations required until the image converged to a stable compression result.
[3] https://informationsecurity.uibk.ac.at/pdfs/BHB2022_IHMMSEC....
[4] PhotoLine
Later compression algorithms were focused on video, where the aim was to have good-enough low-quality approximations.
Deblocking is an inelegant hack.
Deblocking hurts high quality compression of still images, because it makes it harder for codecs to precisely reproduce the original image. Blurring removes details that the blocks produced, so the codec has to either disable deblocking or compensate with exaggerated contrast (which is still an approximation). It also adds a dependency across blocks, which complicates the problem from independent per-block computation to finding a global optimum that happens to flip between frequency domain and pixel hacks. It's no longer a neat mathematical transform with a closed-form solution, but a pile of iterative guesswork (or just not taken into account at all, and the codec wins benchmarks on PSNR, looks good in side by side comparisons at 10% quality level, but is an auto-airbrushing texture-destroying annoyance when used for real images).
The Daala project tried to reinvent it with better mathematical foundations (lapped transforms), but in the end a post-processing pass of blurring the pixels has won.
I wonder if other species would look at our images or listen to our sounds and register with horror all the gaping holes everywhere.
> Application error: a client-side exception has occurred (see the browser console for more information).
seems like website doesn't work without webgl enabled... why?
However, I do have to give one bit of critique: it also makes my laptop fans spin like crazy even when nothing is happening at all.
Now, this is not intended as a critique of the author. I'm assuming that she used some framework to get the results out quickly, and that there is a bug in how that framework handles events and reactivity. But it would still be nice if whatever causes this issue could be fixed. It would be sad if the website had the same issue on mobile and caused my phone battery to drain quickly when 90% of the time is spent reading text and watching graphics that don't change.
This reminds of of the sort of work Nayuki does: https://www.nayuki.io
> Application error: a client-side exception has occurred (see the browser console for more information).
OK, but what does this have to do with JPEG compression.