I kinda got lost in there but I think you do need to go close to full scale even if you only want 16 bits. When going from 24>16 it's the least significant bits that get dropped. So if you're down 12 dB, you're at 22 bits on a 24 bit scale, or 14 bits on a 16 bit scale. Maybe dithering can push you up a bit, but I don't think you can say hey, we've got 22 bits so there should be at least 16 available. With 16 bit on this box, you're using bits 9 thru 24 and throwing away the first 8 (ignoring dither for the moment).
Ok, I guess we're even, 'cause I'm totally lost in what you're saying.
I think you've got a fundemental misunderstanding of what dithering does (or perhaps me, but we're definitely not in agreement). It is my understanding that when you say "I don't think you can say hey, we've got 22 bits so there should be at least 16 available" that you're mistaken. My understanding of dithering is that is exactly what happens. So you're right that if you're 12db down, you've only got 22 bits on a 24bit scale or 14 bits on a 16 bit scale. But you'll have 16 bits on 16 bit scale that has been arrived at through dithering from 24 bits (or 22 bits in this case). So, yes, the exact purpose of the dither scheme is to maintain 16 bits of true significant bits when the original signal is 24 bits (or 22 bits if we don't record hot enough to get 24 significant bits). Even if you only have 18bits of info, you should be able to dither down to get 16 bits of true significant data.
The purpose of dither is essentially to add noise. I'm going from a somewhat foggy memory here, but it is something like this. If you have 24 bits of info, the least significant 24th bit will vary between being a digital 1 or a 0. Same thing with the 16th bit of a 16 bit word. But the 16th significant bit of this 24 bit word will be set at either a 1 or a 0. Simply lopping off 8 bits from the 24 bit word to get a 16 bit word will then leave this 16th bit at a given state of either ending in a 1 or in a 0. This makes for a harsh sound and poor representation of the analog signal, since the least significant bit should vary between a 1 and a 0 and not be stuck at either one or the other. The purpose of dither is to add just the right amount of noise to the signal so that this 16th bit (least significant bit) varies between 1 and 0 (and not to add so much noise that you affect the 14th or 15th bits). Anything random will work, including just having enough noise in your electronic signal path of your recorder so that the 16th bit is random. But the idea is to create a dither scheme that uses the information present in the (say 8 ) least significant bits that you're tossing aside to correctly add noise to the 16th bit of the now dithered word so that it will vary between 0 and 1.
It all depends on the exact specifics of the dither algorithm that is used, but done well, a good dither algorithm should be able to make good 16 bit signal (with 16 bits of significant info) from an 18 bit word (the Apogee AD500e did this), or a 22 bit word, or a 24 bit word. So yes, it is my understanding that the dithering scheme does exactly what you think it doesn't do -- that is is builds a 16 bit word out of the significant data from a 24-bit word, even if that 24-bit word only has 22 bits of significant data. It would depend on the specifics of the ANSR dither scheme, but it would be a pretty poor dither scheme if it didn't function to get 16 bits of signicant data out of 22 significant bits.
Edit: stupid emoticons: the number 8 followed by a ) should not always =