I want to go against the grain a bit here for what's being said. Before we get too bogged down on a few topics here, my thesis statement: you should record in as high a digital fidelity (both bit depth and sample rate) as it is convenient for you to store. If you are recording to massive internal hard drives or digital memory, you have little reason not to record at 24/96.
There are a lot of misconceptions about digital audio, stemming from its misrepresentation to the public as well as its misinterpretation by the public. It's a bit esoteric from the academic side, but that's because to really understand what these numbers like bit depth and sample rate really mean requires a few semesters of calculus plus a course in discrete math and another in linear algebra, then you need a course in continuous-time (analog) and discrete-time (digital) signal processing and information theory to understand the why. And in the other direction, academia is pretty deaf to the questions, criticisms, and comments from the layperson - for example, a claim like "I don't hear a difference between 16/44.1 and 24/96" is rarely met with anything less than scoffs from engineers.
Before going further, a few points I want to highlight from this thread:
I’m unclear on quality difference between the rates, and whether downsampling algorithms degrade sound quality.
A downsampling algorithm when used incorrectly can induce distortion in the phase of the signal for non-integer downsample rates, and you might perceive that as less treble detail, a shift in soundstage (or worse - a constantly shifting soundstage for some instruments), or some level of "splatter" and "time smearing" - sorta like those cheap 90s karaoke microphones that sounded like they had a spring inside of them.
If you are doing integer down-sampling of 2:1 or 4:1 - so for example, going down to 48 kHz from 96 kHz or 192 kHz sample rates - and your down sampler is just throwing out every other sample rather than doing some form of linear phase interpolation, then you should hear no discernible difference between the higher sample rate and the lower sample rate of the exact same audio content, assuming the digital-to-analog converter (DAC) performs identically at all sample rates.
As far as 44.1 vs. 48 kHz for an original recording is concerned, there shouldn't be enough sonic difference to be detectable by a human listener. If there is, the equipment (or your process of making the judgment) is suspect IMO. Once the sampling rate exceeds 2x the highest frequency being sampled, the only audible differences should be in the side effects of the anti-imaging and anti-aliasing filters. The whole decades-long saga over sampling rates is really about filtering, not sampling. In general there are tradeoffs between the frequency-domain and time-domain behavior of any filter (analog or digital). There are dynamic range and distortion considerations as well.
For digital filters, some important optimization constraints relax if the sampling frequency is raised. But the <10% difference between 44.1 and 48 kHz isn't enough to matter in that way. You'd need to go to (maybe) 60 or 64 kHz before you're totally "in the clear" for real-world program material. Thus we have 96 kHz as a professional standard--plus the usual assortment of people who think that no sampling rate is ever high enough, because they misunderstand sampling theory, i.e. they imagine that it becomes "closer to analog" the higher you raise the rate, which isn't how digital audio works.
All the potential sonic problems of filters become less if their design is less aggressive (fewer "poles"). Few live, acoustic signals have significant energy at 20 kHz or above, and those that do are rarely recorded close-up by consumers using microphones capable of conveying 20+ kHz signal components. So as a generality, there should be less need for aggressive filtering, and correspondingly less "need" for 96 kHz sampling in consumer recording equipment (i.e. to shove the filter problems brute-force up out of the audible range). For better or worse, though, manufacturers tend to design equipment for worst-case scenarios, and unfortunately, this aspect of recording equipment, though readily measurable, isn't usually described in spec sheets or on-line reviews.
One reason I love DSatz - that first paragraph is BANG right on the money.
To expand on the second paragraph, I want to clear the air here once and for all: why 48 kHz vs 44.1? Or why 96 vs 88.2? Well in the 80s when much of digital media was till the Wild West, we established independent standards for audio and video - rightfully so, as they are two very different media. Digital sampling theory (going back to Nyquist-Shannon
in Nineteen Twenty Fucking Seven - it always blows my mind how ahead of the curve the paper these guys' wrote really is) tells us a few things. Namely: if we have continuous, analog media, then we capture it at discrete intervals and then try to reproduce it accurately and without distortion, then we must capture data at a rate
at least twice the highest frequency (or twice as fast of a period, in the time domain) as the fastest signal contained within the original continuous media. The top range of healthy, young human listening is approximately 20 kHz; so per Nyquist-Shannon, we need to capture data at a rate of at least 40 kHz. As David points out, the implications of needing to go from digital to analog when incorporating "real-world/environmental" factors are that we can't design a perfect re-constructor, so we over-sample a little bit at 44.1 kHz to account for this. We call this 44.1 kHz sample rate 'red book" since it was possible in the 80s to produce converters between the digital and analog domain
relatively inexpensively at 44.1 kHz - any faster, and you need a more robust (and thus more expensive) converter.
48 kHz comes from the video world - playing by the same rules, video engineers' standard is to capture data at 24 frames per second. For those of you playing along at home, I want to highlight this mismatch not just in number, but also in magnitude, since I fight video engineers regularly about this point and why audio processing is such a bigger and more ignored challenge than video processing.
Audio: 44,100 Hz
Video: 24 Hz
Yeah. Digital audio filtering - especially if you want to do it in real time, or "on the fly" without inducing any lag - is a helluva lot harder.
But where was I... Ah, RIGHT! Well, as you've gathered, resampling algorithms are a MASSIVE can of worms, and aren't always guaranteed to work sans phase distortion, unless the re-sample ratio is an integer multiple.
I think you can see where this is going.
48 kHz became a standard in audio, because it's just flat-out easier to sync up with video feeds. The numbers 24 and 48000, even if off by a few magnitudes of each other, are a nice, elegant, integer ratio. So for that reason alone, 48 kHz became another standard used around audio and to this day it tends to be a common format for video and gaming platforms.
44.1 is a carryover from when digital audio first became a thing in the 80s - it was the standard, and today we just have enough knowledge of how signals operate at this sample rate - and it's even part of many digital audio standards - that we frequently still record, process, and listen to music at 44.1 kHz. 48 is slowly taking over, especially as mobile phone DACs move to 48 for standardization across video platforms like people watching YouTube and Netflix on their phones. One isn't better than the other inherently, it's down to what you prefer and if you work with video engineers.
For the last paragraph above, DSatz says
"All the potential sonic problems of filters become less if their design is less aggressive (fewer "poles")." This one is an attempt at a broad generalization that contains a lot of caveats to really say; it's in the right ballpark, but given the layperson's penchant to misinterpret the implications of this sentence I'd flag it as "with conditions."
id take native 44.1 over 48K resampled to 44.1 personally
Again, that's heavily dependent on the resampling algorithm and method used. If it's a pretty high order linear phase filter, I think you'd be hard-pressed to find a massive difference between native 44.1 kHz content and properly resampled 48 kHz down to 44.1 kHz content on the same converter.
Is anyone still burning CD's?!?
Following on what DSatz said, there IS a lot of suspect equipment out there, very EXPENSIVE suspect equipment you wouldn't suspect of being.....suspect. I've heard lots of cases where cymbals and acoustic instrument treble was noticeably better at 48 over 44K1, observing the long pattern results from multiple recording sessions with the same equipment, different sample rates. As to high rates, I used to have multiple converters that clearly sounded their best at 88K2 or higher, now there's much less observable difference.
Yep - me. Though I am admittedly pretty niche. Speaking of which, if anyone has old music CDs officially released by artists, or blank CD-Rs taking up space in their home, drop me a line - I'll take 'em off your hands.
Regarding that second paragraph - this more likely has to do with how the actual converters were designed - the analog circuitry around them is likely more optimized to work at 48 rather than 44.1 kHz with lower distortion, and that's likely what you're hearing more so than any property of the sample rate of the file.
If your target audience includes dogs and birds, or the intended use of your recordings is to pitch shift them downwards more than one octave, then I suggest using f(s) rates higher than 48k, else not.
I have finally settled at recording at 24/48 and distribution of finalized work at 16/48, as none of my recordings exceed 96dB s/n.
Quick side tangent: there are lots of signal processing techniques we can use to get around the octave limit for pitch shifting up or down outside of oversampled material. At the risk of accidentally breaking some NDAs, I'll just comment that companies have been doing pitch shifting of significant ratios since the late 70s, and getting it down pretty clean (even by 2020 standards) by the 80s. What we can do today with pitch shifting is really darn cool, but the methods to clean it up are held pretty tight by industry since a good pitch shifter is a lucrative algorithm.
The mention of bit depth and SNR of 96 dB is useful, but I'll loop back around to that later when I discuss why I think 24 is better than 16 for recording.
I can't tell a difference and I have pretty good playback gear.
You had me at the first half, and lost me on the second. At the end of the day, this really boils down to if YOU can hear a difference. That's all that should matter. But you can't generalize your hearing and your playback equipment (as good as it may be) to everyone and every playback rig on this particular note, even the very expensive ones - is your converter designed to run with lower distortion at one clock rate over another?
you can make equally good recordings with 16 bit, but with good equipment, 24-bit is useful as you can achieve comparable results without riding the levels as much, which is risky
Yep, one of a few great reasons to record at 24 bits. The extra headroom to avoid clipping is a great perk.
Any of the MOTU converters before the 2014 updates, for example, sound obviously worse up top at 48 and under. Conversely, the Aurora converters of then and earlier apparently did sound best at 48.
Yep, sounds about right - the converter circuitry is probably optimized for lowest distortion at 48 kHz, knowing that's likely to be the most commonly used. FWIW, I've never met a MOTU product I liked so this doesn't surprise me, but I want to avoid bashing them - I'll just say I've never had one of their units work easily or correctly as-advertised in any system I've worked on.
ANYWAY... I mentioned you should record as high as possible for both bit depth and sample rate. So,
why do I recommend you do that? Even if you can't hear a huge difference?
Post production
I come to taping from studio production. I started recording my own music on a computer in high school, then I started producing in college as I learned more about digital filtering and computer music. Eventually friends asked me to produce for them too as they liked my production sound, and I've been down the digital filter wormhole ever since. Eventually, before a show one day I just walked up to the tapers' section and happened across some very kind faces who answered all my questions, and I leveraged my production chops to start doing remix and mastering work.
The way digital filters work, ESPECIALLY a digital filter emulating an older analog filter - they approximate. They can internally upsample to help improve their performance and there's more wizardry under the hood we can run there, but even still; the higher the sample rate, the more precise and accurate the filter sounds. This just sounds SO much cleaner, better, and brighter. This filter accuracy CAN be heard both at the higher sample rates at 96 kHz, as well as after that 96 kHz master has been down-converted. It's not as subtle as you'd think, it's very audible.
A good way to think about this: if a filter is a knife, the higher your bit depth the more precise your cut location, and the higher your sample rate the sharper your knife is. Both have implications for how accurately your knife is cutting - correct location AND the shape of the cut, straight lines are straighter and curved lines are closer to the curve's spec.
I encourage you to try this out for yourself, and trust your own ears. That's a better sell for the general idea for this than any amount of math and numbers I could throw at you. I've been meaning to make a comp of this to help prove my point in general to tapers, maybe this is good motivation for me to set aside some time in the next month or two and make a comp of A-B for both raw tapes AND tapes with processing...
So take it from me. You might not hear a huge difference today... but do you know how much I would MURDER for 24/96 recordings of my favorite Phish and Dead shows from the 90s? And I wasn't a thought on any taper's mind back then.
A few more points, so I can wrap this up for the moment and cook dinner:
-I never went deep into bit depth and why it's important. A quick note: if you are adding or multiplying two digital numbers together, the more bits you have the more accurate your addition (and that is the fundamental operation of filtering - delaying, multiplying, adding).
-How much more space are these files taking for you? Digital storage is cheap.
-Sure it takes more work for you to arrange and join files (especially if you record more than two channels), but the extra bit of work will be worth it. I think 24/192 is a bit unwieldy given WAV's 4 gig file size limit so needing to splice a set even for just two channels can be frustrating, plus if I'm recording an entire run of shows those file sizes do add up quickly to fill a 64 gig SD card. HENCE, my statement at the beginning: record in as high a fidelity as it is convenient for you. For me, that's 24/96.
-For the raw tape, if you're doing your re-sampling and bit reduction correctly you shouldn't hear a significant difference. I have been double blinded and I consistently sniff higher resolution files from a mile away, but to my ear the differences in the raw files with no processing applied are slight, and I'm not convinced I'm hearing any benefit from the file instead of converters designed to operate at higher sample rates. My personal DAC at home does internal upsampling to a ridiculous degree and domain anyway, so 16/44.1 usually works just as well as 24/96 for playback for me. AS SUCH, I personally record at 24/96, do any processing at that rate, and then bounce down to 16/44.1 or 16/48 for release.
If anyone has questions, or criticisms of any of this: fire away.