And please guys, I'm not saying that you ARE wrong. I'm just saying that I don't get it and I'd really like for someone to explain to me in a technical manner why dithering is desireable. At best, it seems to me that it could be argued that dithering is inconsequential (which seems unlikely since some of the golden ear set claim to be able to hear 20 kHz sounds).
Dithering is desireable because it's random (or pseudo-random) in nature. When you go from one signal to a lower-depth one, you create errors in the waveform because it is no longer equal to the original digital signal. This is inherent to the process, and there's nothing "bad" about these errors themselves. However, when you just truncate the signal, for example, you are creating an error that is predictable. Humans are very good at pattern recognition, and many can distinguish this pattern in audio.
The errors that occur are at the very bottom of the signal (LSB). Dithering blends this bit (this is where my knowledge gets fuzzy, pardon the pun, whether dithering exclusively targets the LSB) with random noise so that there is no longer a discernable pattern in the quantization errors that are inherent to bit depth reduction.
Wait a minute... Dithering is desireable because it's random? So is noise. The way I understand it, dithering schemes essentially add bandlimited noise to the signal in such a fashion as to alter the LSB (Least Significant Bit) and in some schemes, even the next to the least significant bit (NTTLSB maybe?). We're told that it is not audible, but it's still noise. Adding noise is not something that seems prudent to me unless you can give me a reason to do it. And saying that it's not audible is not sufficient reason to do so, in my opinion.
I also disagree with your assertion that truncation results in an error that is predictable. Please elaborate. I believe that truncation error is, by its very nature, unpredictable and uncorrelated from sample to sample as long as the recorder's anti-aliasing filter is properly designed.
Now, when you merely truncate a signal, I can see that this is not desireable because some of the resultant errors will be greater than 1/2 LSB in size. Here's an example:
Keep in mind that there is an implied binary point to the left of all of these numbers. 0000000000000000 is the lowest 16 bit number available and 1111111111111111 is the largest 16 bit number available. The average number is 0111111111111111 or 1000000000000000 and most schemes will encode silence to a string of samples whose values are all 0111111111111111.)
Original 24 bit sample
011010110111011011110011
Truncates to this
0110101101110110
Which is an error of
000000000000000011110011 (more than 00000000000000001, which is 1/2 LSB at 16 bits, otherwise known as the 17th bit. This error is approximately -97 dB with respect to full scale.)
If you round the 24 bit sample to a 16 bit sample instead you get this:
0110101101110111
Which is a much smaller error of
000000000000000000001101 (This error is approximately -122 dB with respect to full scale.)
If you always round instead of truncating, your error will ALWAYS be equal to or less than 1/2 LSB. My claim is that this is the way things should be done. Presumably the error in each sample in the 24 bit waveform is uncorrelated to the error in any other sample and this will be true as long as the anti-aliasing filter on the front end of the recorder is designed correctly. If you always truncate, then your error will always be between 0 and 1 LSB. Rounding will yield an expected error per sample that is -108 dB with respect to full scale, whereas truncation will yield an expected error per sample that is -102 dB with respect to full scale. When you add or subtract 0000000000000001 to or from your 16 bit sample during the dithering process, your resulting error will be between 0 and 2 LSB and the expected error per sample will be -96 dB with respect to full scale. That's like losing 6 dB of S/N with respect to what you get with simple truncation and it's like losing 12 dB of S/N with respect to what you'd get by rounding. Again, I'll admit that the noise is added in the portion of the spectrum that is least audible, but unless it provides a real benefit (and I'm not convinced that it does) I don't see the point of dithering.