rairun, what you say is entirely correct with regard to sampling and reconstruction per se. However, any significant signal energy at or above 1/2 the sampling frequency will cause aliasing distortion, and thus we are all sternly commanded to filter out such energy prior to (or as part of) the conversion process.
Significant energy at such frequencies is rare in real-world sound, as opposed to specially generated test signals designed to challenge a recording system. But in order to handle the worst cases without audible aliasing, digital audio systems conventionally use low-pass filtering with stopband attenuation of 60 - 80 dB or even more. This forces those filters to be awfully steep at sampling rates when there isn't much margin between their turnover point and the Nyquist limit (e.g. the narrow interval between 20 kHz and 22.05 kHz in the case of 44.1 kHz sampling).
The anti-aliasing filters in the PCM-F1 were 9th-order IIRC, and the ones in the PCM-1600 and 1610 were 11th or even 13th. That's way more than would almost ever be needed in my opinion, but it's been the general practice for decades. Such filters can have audible effects on impulse response. They can do things that have no counterpart in the real world of sound, such as ringing that starts before an impulse has actually begun (as well as continuing after the sound has stopped, as one might expect).
At higher sampling rates, on the other hand, the filters don't need to be nearly as steep--and even if they are, with their turnover point an octave higher there will still be far less time-domain nonsense below 20 kHz. Thus the impulse response of a system with a higher sampling frequency can be better--even (occasionally) audibly so under certain conditions. That said, you are also perfectly right about the limitations of most playback systems--especially conventional dynamic loudspeakers. Most of the time when people have provably, repeatably heard differences between 44.1 and 96 kHz in controlled tests, they have been listening over electrostatic headphones to specially generated test signals--various chirps and clicks, rather than real-world music, speech or even nature sounds (some of which are much more demanding than ordinary music)--and the people have generally had some training in how to listen critically to those test signals and hear differences. Some percussion instruments generate impulses that might be "diagnostic" for filter problems, though, plus we don't know whether playback systems might get better some day.
There is also a vague general belief that certain post-processing algorithms will generate less distortion if the sampling rate is higher. To me, if that is so, it sounds like a defect in the software--plus I've never seen anyone actually narrow it down to which algorithm or which software this is supposed to maybe happen with, i.e. I think we may be veering into urban-myth territory where that's concerned.
I still generally record at either 44.1 or 48 kHz when I record at all these days, depending on whether my recording will be synched up to video or not. But if I were a recording company (the two or three that are left nowadays) I suppose I would record at 96/24, because who knows--some day bandwidth may be so cheap and playback systems so good that it will matter a little to some people. I don't think that will happen during my lifetime, but that doesn't mean it can never happen.
--best regards