In English, the job of the dither algorithm (when dithering to 16 bit, in this example) is to take the 15 most significant bits from each sample and use noise as the 16th bit to mask the quantization noise. I *think* this noise is what largely differs from one algorithm to the next. Some use certain frequencies of noise (noise shaping), others use pseudo-random noise, and at least one uses analog random noise.
Quantization (to 16 bits in this case) inherently creates some noise and the use of the last bit is a means of making this noise inaudible in one way or another - by pushing it past the limits of human hearing, by making it random and thus not 'picked up' by our brains, or some other means.
Perhaps those more versed in psychoacoustics or FFT-based math can explain this better than I just tried to.
-Matt