This video is of special interest to me.
Approximately a bazillion years ago, I “invented” jpeg compression — standing on the shoulders of people who had thought to apply DCT to images, I wrote a program to do the same, plus some on-the-fly perceptual filtering based on the color and spatial resolution of the human eye. (JPEG instead has this summarized with a “quality” slider; Photoshop used to show you a preview and let you set it manually.)
Basically, my algorithm looked at the difference between the compressed and uncompressed output and decided whether the difference would be noticeable. This is non-trivial as you have to look at the H, S, and V of the context to decide — the eye has (for example) a greater ability to distinguish hue in the greens than in the reds.
I was hamstrung by the hardware available to me at the time; it took hours to encode even a small segment of an image. And my Atari ST’s monitor could only show 16 of 512 colors at once at 320x200, so I used time-based multiplexing (e.g., flicker red-red-black to get 2/3rds red), which my dad took long-exposure (analog) pictures of to average the colors. I also reverse-engineered print screening to try to stipple a higher color resolution, which didn’t look great. I ended up only being able to demonstrate a 80x50 ‘pixel’ section of the image before and after compression.
It took about a day to compress a full image…but resulted in 20x to 50x compression, with minimal visual disruption. This was unheard of in that era of TIFF files, which would usually only achieve 2x–3x compression using RLE.
This was 1991, and I guess it was beyond the ability of the reviewers to understand, so I forgive them for only landing me a semi-finalist spot in the National Science Search; I lost out to someone who folded paper into regular polyhedra, which looks good and is comprehensible. (The math behind folding is really pretty cool, but I don’t recall any of that in their project; or maybe it was there, and it was one of the OrigaMIT guys.)
So it’s a topic of special interest to me, and I’ve often wondered about the same differencing for MP3s. Compression is audible when it’s done wrong — on a MiniDisc player, you could hear the algorithm killing the less-important details like the shimmer of cymbals.
In most songs, and on modern phone calls, breathing is largely removed by some sort of noise gate. I find this annoying: you use the other person’s intake of air as a cue to stop talking, and if it’s not there, you barge in on each other’s conversation a lot more.
That observation is borne out by the linked video; you can hear Vega’s intake of breath before each lyric quite clearly. Very cool stuff.