SparkE, I am just trying to get these things to laymans levels, a perfect brickwall filter does not exist, sad but true. If we need perfect (unheardable) filtering of sub-samplerate frequences, OK, use 48 kHz and not 44.1 kHz, that keeps the minuscule time domain problems at around 20 kHz at bay. 96 kHz is an overkill. But the final products are often 44.1 kHz anyway (CDs...), so what is the point.
This "when a note plays" stuff: If the system plays perfectly the audible frequency range, also the timing information is perfect as far as we can hear it. This talk is about "timing" and "notes" not being in full sync is total BS inveted by golden ear belivers who lost their hobby when turntable tweaking went out of fashion. It has no scientific base and no test evidence to prove it.
About time coherence: If there were some timing problems in these systems, they would have to in the order or 1/50000 sec. (why else would people demand sample rates of 96 kHz and above?). Half wavelength at that frequency is 3.3 mm. By shifting your head by 1/8 of an inch while listening would throw the image out of whack, or what? We all know perfectly well it does not happen. Hearing is not that presice, audible range from about 16 to 20000 hertz contains all the information we humans can use, time, amplitude, transients, everything. There is nothing out there.
All this has nothing to do with bit depths. As previously said, 24 is convenient compared to 16 when recording and editing, for the final product 16 bits is plenty enough. Just to remember the original question.
Echo1434: you have one major flaw in your thinking of analog versus digital. Even 16 bit system has infinite values for the final output; lowpass filtering after the D/A conversion smooths out the waveform, there are no 65000 steps there in the sound you listen. And besides, in 24 bit systems those imaginary 16 million steps do not replace the 65000, they reside outside of that first 65000 step area, because adding 8 more bits gives just more dynamic range. Not "resolution", 12, 14, 16 bits define the waveform perfectly and steplessly within their dynamic range windows. Adding more bits adds dynamic range, it gives only more resolution to the most quiet sounds, loud sounds are already perfecly taken care off. Graphic representations of digitized waveforms give a wrong idea about the workings of the system.