I've heard this stuff before about 24 bit > mp3, and I'd really like to see a link to a scientific/academic reporting on the issue (no offense meant echo, I just don't know where you're getting these facts from).
On the face of it, I have to doubt that not dithering first to 16 bits would create any discernable differences. There is no such thing as 24bit mp3's, but there is also no such thing as 16bit mp3's either. From what I understand, mp3's are floating point representations, and wav/flac/pcm are integer representations, so they really are different beasts entirely.
Regardless, mp3's -- MPEG-1 Audio Layer 3 files -- are governed by a few ISO/IEC standards. Notably, the first layer 3 standard published in 1993 was the ISO/IEC 11172-3. This original standard was updated in 1995, published as standard ISO/IEC 13818-3. You can read these mp3 specifications by searching on the standard name/number in the IEC website at
www.iec.ch.
Section 0.2.3.2 of the 13818 standard deals with audio inputs: mp3 encoding meeting the IEC standard can accept PCM audio inputs with sample rates of 32K, 44.1K, and 48k, and can accept quantization up to 24bits per sample. There isn't any 24>16 bit truncation going on, the mp3 encoder takes the whole 24bit audio sample and converts it to a new, compressed audio representation. An mp3 encoder meeting the IEC standard will not simply truncate the input audio down to 16bits and then do mp3 encoding.
Think of it another way: dithering is essentially just adding noise to your signal to insure that the 16th bit (LSB) is random once you step down from 24bits, and not left at either 0 or 1 for longer intervals, since that can sound grating. Saying you need to dither a 24bit file to 16bits in order to not cause the same issues for mp3's would imply that a floating point mp3 decoded back to wav/flac is completely exact out to 16bits as the original 24bit wav/flac file, and thus the problems of truncation come into play. Since the mp3s will not have that level of faithfullness to the original, there will be inherent noise that will have the same effect as dithering. If this were not the case, we could take a 24bit file, provide dither noise so it only had 15bits of non-noise information, save the dither-noise added file still at 24bits (so it would essentially be a dithered 16bit file with an additional 8 bits of noise added), and then do mp3 encoding. If the mp3 encoder worked in such a way that truncation were indeed an issue, we must be getting a result that is faithful out to 16bits (otherwise there wouldn't be a truncation issue) -- thus, after encoding and decoding, we have something that was faithful to 16bits with only the lossy compression affecting the additional 8 bits, and we could then just truncate off the last 8 bits and be left with a perfectly intact 16bit dithered recording faithful to the original (the original 24>16 bit dithered recording that is). Clearly this is not the case.
The same IEC standard also notes that mp3's support output up to a 24bit PCM quantization. So if you took mp3's made from a 24bit source, and then converted those mp3's back to 16bit pcm wave files (for instance, converting the mp3's back to 16bit wav to burn onto a CDR), you might have an issue with truncation and degraded sound quality. (Supposedly mp3's can support up to 24bit pcm output, with an effective dynamic range of 20 bits.) But if you listen back to the mp3's on an mp3 player directly, truncation is not an issue.
As to the original question, according to the IEC standard, mp3's can input/output at 24bits, and have 20bits of effective dynamic range. Based on this, it seems that you would get better sound from converting 24bit pcm files directly to mp3, rather than dithering to 16bits first and then encoding. Given the lossy nature of mp3, I doubt most people would hear a difference either way you did the encoding, but if you weren't planning on dithering down your 24bit PCM files to 16bit files anyway, encoding directly would save you some time.