alright, having taken many systems and signals classes and a lot of digital signal processing, here is my guess as to why it will sound better:
when a PCM wave is recorded, the 24 bits (or 16) from the AD are stored somewhere. as the recording progresses, each sample is recorded at a distinct time (k*f where k>=0 and f is the sampling rate). This means that when it goes to play it back, there has to be some interperlation. this means that it places the sample next to the previous sample and connects them with a line (or best guess curve using previous values). then it goes to the next sample and repeats the process, forever guessing what is exactly between the sample. i know most people think that when sampling at such a high rate it wouldn't make much difference, but it will.
now DSD uses a 1-bit-delta scheme. this means that it doesn't interpolate, but just add or subtract one unit (guessing its somewhere in the range of a millionth of of Hz) from the previous value. this means that it will always be smoother then PCM because it never "guesses". this would also account for the reason it must sample so fast. if the sound went from 0 dB to -20 dB, it needs to be able to complete that transition very fast. if it lags (that is, doesn't get down to the desired value fast enough), the sound will suffer.
please correct me if i'm wrong, but i think this may help some people understand the difference between the two