when you do any processing to the file (including resampling), you want to do it at the highest bit depth as possible, to avoid "rounding errors" in the mathematical calculations. you'll get more accurate results using 24 bits of info when you resample (many programs actually have internal calculations at bit depths of 32 or even 64 bits).
Once all the processing is done, you now have a file that's sampled at 44.1 kHz, but still at 24 bit (or, 32 bit or whatever internal bit depth was used), and you need to get it to 16 bit. the only reason why dither is applied to the file is to hide any digital noise that may occur if the bit depth was just "truncated" to 16 bits (i.e. only the 16 most significant bits kept, and everything else thrown away).