For playback, as long as the dynamic range of the recording "fits" into 16 bits, which is probably true for virtually all live recordings, there won't be an improvement with 24 bit. Extra bits of noise. Since most people are recording in 24 bit, though, I can see the convenience factor of just releasing in that format. As for frequency, as long as you hit the Nyquist frequency, there is nothing to be gained from higher rates. You can always upsample for post.
This isn’t really how discretization of audio signals (or any signal, for that matter) works.
Increasing bit depth isn’t adding more bits below a certain threshold. Rather, it’s subdividing the range between 0.0 and 1.0 into more “bins” if that makes sense. So you can more accurately without error represent numbers in the time domain. Audio converters also don’t do this linearly anymore - they use an encoding process called delta-sigma encoding, whose details are outlined here:
https://en.m.wikipedia.org/wiki/Delta-sigma_modulation I won’t go through all the nuts and bolts of how it works, but its practical application and effect to audio lies in being able to more accurately represent data logarithmically, which is of use given we hear pressure (and thus dynamic range) on a logarithmic scale.
A second benefit of delta-sigma lies in post production. Basically, plugins can operate at a higher degree of precision, yielding fewer errors when they perform their calculations. So filter calculations are more accurate here.
Likewise, increasing sample rate isn’t the same as recording and playing back natively at the higher resolution. Assuming a perfect up/downsample ratio of, say, 2:1 (so, going from 96k to 48k or 48k to 96k), if you downsample you’re just throwing out every other sample, which is still an accurate representation of what occurred, distortion-free. Upsampling from 48k to 96k means adding samples in between where two exist - you’re adding information that wasn’t there before. Regardless of the process used to do this, the end result of adding in a sample will induce some error - and thus distortion - to the signal. The degree of this distortion is up for debate, and it can be arguably minimized or even below the threshold of perception. But in The end you are still trying to add info that wasn’t previously there. The argument gets compounded if you’re upsampling to a non-perfect ratio (44.1 to 96, for example) as the two signals share samples less frequently.
Sample rate decisions should be made after listening to the quality of the converter being used. Some sound noticeably better at specific rates, others do not.
This is particularly clever, and a methodology I actually agree with. The differences in the sound of gear as a function of sample rate are partially a result of what I’ve mentioned above (I.e. how accurately the original signal is actively being rendered), and partially a function of jitter - or, how accurate the master clock controlling the converter is. Effectively, increased jitter = larger deviations from when a signal is “supposed” to be captured = larger error of when, on playback, the DAC is expecting a new batch of info and not getting it within its own specs = (arguably, as far as perception) distortion. The lower the jitter, the better the signal will sound. Most clocks on audio gear samplers are optimized to run at one specific sample rate with the lowest jitter; it is preferable to use this sample rate for playback.