Author Topic: 48 kHz vs 44.1 kHz sample rate (Read 107539 times)

Gutbucket · « **Reply #30 on:** September 23, 2020, 10:38:37 AM »

Quote from: wforwumbo on September 22, 2020, 07:35:26 PM

Quote from: Gutbucket on September 22, 2020, 12:35:55 PM
Does it make any difference if the the entire raw recording is upsampled upon loading in the editor prior to filtering or is doing so essentially the same achieved by upsampling within the filtering being applied, except being less efficient?

The short answer is yes there's a difference, and no the same result is not achieved by upsampling the raw data before you hit it with a filter. The underlying intuition for this is as follows:

Take a discrete sequence (in this case, the samples containing audio data). Now throw out every other sample; you have just simulated a downsampling rate of 2:1. Now, try to up-sample back to the original sample rate. The resampling algorithm here used to go back up to the original sample rate may or may not deviate from the original samples you had. That error is HEAVILY dependent on the content of the original signal, and signals are generally mixed in the probability the content between two samples is linear/exponential/sinusoidal/a mathematically elegant reconstruction function.

I've seen some resampling algorithms leverage properties of the DAC to get lower overall error, but once you lose the samples the error is there - and that error gets compounded down the effects chain (for example, by the equalizer).

I think you misunderstood my question. Not asking about the difference between processing a recording originally made at, say, 96kHz sample rate without oversampling performed within the plugin filter and one made at 48kHz sample rate with 2X oversampling applied within the filter (which is what I'd love to listen for in the potential comparisons you floated in your original post), but rather of two different ways of processing the same 48kHz source recording: Is there any difference between upsampling the entire file to 96kHz first and processing without oversampling by the filter, verses processing the 48kHz file with 2X upsampling done within a filter?

Follow up question-
Up sampling seeks to determine the value of additional points along the waveform using mathematical interpolation, and reflects a mathematically rigorous determination of what the new values should be. That's a logical approach to limiting error. However, I can imagine other predictive schemes which might better reflect actual acoustic behavior, albeit at the risk of introducing other forms of error. Adding harmonic distortion above the cutoff frequency, excited by a range of signal just beneath cutoff for instance. Or perhaps in a more targeted way by identifying harmonic series' which appear to have been truncated by the anti-aliasing filtering and extending those in such a way as to mimic natural harmonic progressions and decays typical to real world sounds, without adding spurious harmonic distortion to the entire signal in that range just below stop-band cutoff.

Do any up-sampling techniques do this? Perhaps is is a hidden proprietary plugin filter upsampling strategy?

wforwumbo · « **Reply #31 on:** September 23, 2020, 12:51:09 PM »

Quote from: Gutbucket on September 23, 2020, 10:38:37 AM

Quote from: wforwumbo on September 22, 2020, 07:35:26 PM
Quote from: Gutbucket on September 22, 2020, 12:35:55 PM
Does it make any difference if the the entire raw recording is upsampled upon loading in the editor prior to filtering or is doing so essentially the same achieved by upsampling within the filtering being applied, except being less efficient?

The short answer is yes there's a difference, and no the same result is not achieved by upsampling the raw data before you hit it with a filter. The underlying intuition for this is as follows:

Take a discrete sequence (in this case, the samples containing audio data). Now throw out every other sample; you have just simulated a downsampling rate of 2:1. Now, try to up-sample back to the original sample rate. The resampling algorithm here used to go back up to the original sample rate may or may not deviate from the original samples you had. That error is HEAVILY dependent on the content of the original signal, and signals are generally mixed in the probability the content between two samples is linear/exponential/sinusoidal/a mathematically elegant reconstruction function.

I've seen some resampling algorithms leverage properties of the DAC to get lower overall error, but once you lose the samples the error is there - and that error gets compounded down the effects chain (for example, by the equalizer).

I think you misunderstood my question. Not asking about the difference between processing a recording originally made at, say, 96kHz sample rate without oversampling performed within the plugin filter and one made at 48kHz sample rate with 2X oversampling applied within the filter (which is what I'd love to listen for in the potential comparisons you floated in your original post), but rather of two different ways of processing the same 48kHz source recording: Is there any difference between upsampling the entire file to 96kHz first and processing without oversampling by the filter, verses processing the 48kHz file with 2X upsampling done within a filter?

Follow up question-
Up sampling seeks to determine the value of additional points along the waveform using mathematical interpolation, and reflects a mathematically rigorous determination of what the new values should be. That's a logical approach to limiting error. However, I can imagine other predictive schemes which might better reflect actual acoustic behavior, albeit at the risk of introducing other forms of error. Adding harmonic distortion above the cutoff frequency, excited by a range of signal just beneath cutoff for instance. Or perhaps in a more targeted way by identifying harmonic series' which appear to have been truncated by the anti-aliasing filtering and extending those in such a way as to mimic natural harmonic progressions and decays typical to real world sounds, without adding spurious harmonic distortion to the entire signal in that range just below stop-band cutoff.

Do any up-sampling techniques do this? Perhaps is is a hidden proprietary plugin filter upsampling strategy?

Ah, now I see your question. I had assumed you were asking with the condition that the filter was performing internal upsampling regardless.

It's a hard question to generalize, as the answer is dependent on how the filter goes about its business. Depending on the method of coefficient calculation, it could have an impact. But *generally* speaking, you shouldn't hear as big of a difference. The exceptions are for specific digital filters doing analog modeling which try to incorporate some of the non-linear and time-variant properties of said real-world analog filters.

Regarding filter error on your follow up question, the problem we run into is that filter error is Gaussian. This means that the error induces distortion even below the cutoff frequency. The level of this error goes down with some signal processing techniques (sigma-delta modulators in conversion and converting from PCM can minimize this), but the error will almost always be Additive White Gaussian Noise, given enough samples for the error to be analyzed. Side note, worthy of another thread sometime: noise analysis is VERY different in the short-time (i.e. frame-to-frame) as opposed to the long-term, integrated noise response.

morst · « **Reply #32 on:** September 24, 2020, 01:08:28 PM »

Quote from: wforwumbo on September 22, 2020, 07:35:26 PM

Take a discrete sequence (in this case, the samples containing audio data). Now throw out every other sample; you have just simulated a downsampling rate of 2:1.

Is this how downsampling is implemented? I would have guessed that averaging two (or three?) samples would be the method, although it would take more CPU.

wforwumbo · « **Reply #33 on:** September 24, 2020, 01:22:16 PM »

Quote from: morst on September 24, 2020, 01:08:28 PM

Quote from: wforwumbo on September 22, 2020, 07:35:26 PM
Take a discrete sequence (in this case, the samples containing audio data). Now throw out every other sample; you have just simulated a downsampling rate of 2:1.

Is this how downsampling is implemented? I would have guessed that averaging two (or three?) samples would be the method, although it would take more CPU.

The answer is “sometimes”

Good downsampling algorithms that can recognize an integer ratio will do this. But more often than naught, they don’t.

What’s more common in a downsampling algorithm is:
-a block of samples are up-sampled to a common denominator between the native (old) and target (new) sample rates
-the samples “in between” where no native data exists are interpolated. This can vary from “connect the dots” which is similar to the averaging you mention, to “create a quadratic (second order function, which can create curves) or higher order function to model this block of 3-5 samples and use this function to ‘fill in the blanks’” method. Splines are common, and there are some advanced techniques from noise communications like using Lagrange polynomials to generate the missing sample data.
-Use the function you just generated to calculate the value at the target sample rate where no data previously existed.

The upside to the benefit to this method is that it safely assumes you’re not resampling between integer values (for example, resampling from 48 to 44.1) and thus there WILL be some level of error, so you can pick your poison of error in the function used to “connect those dots” - which themselves will be wrong, but you can expect and minimize the wrong-ness. The downside is complex source material - such as music, which contains lots of concurrent frequency and phase data - can behave erratically and unpredictable as samples get shuffled forward/backwards in time and error. This can create the perception of slightly muted treble, moving sources, or just pure mush in the soundstage. Effectively, the recording sounds unstable.

We can play some tricks to try and recover or mitigate the distorted phase, but that’s also a “pick your poison” deal and brings its own caveats. The issue is that resampling distortion is not inherently linear across frequency, and thus no ideal filter exists to perfectly reconstruct the signal except for at integer ratios and that’s almost never guaranteed. Look at this thread - most people (myself included) are likely to go from 96 to 44.1 and there’s no plan to change that; the implication is that unless you control the resampling manually yourself, even going from 88.2 to 44.1 will induce some phase distortion due to common resampling algorithms with phase compensation that doesn’t need to be there and sadly can’t be undone after the fact.

wforwumbo · « **Reply #34 on:** September 24, 2020, 07:25:04 PM »

For some additional useful thoughts on bit depth, I dug up an old post of mine: http://taperssection.com/index.php?topic=184569.msg2251598#msg2251598

Copy-pasting it here, for posterity's sake and a consolidated/collected base of this information. Note that, like most of what I have said in this thread so far, this is a simplification of the problem meant to demonstrate the concept and not by any means a complete reflection of the underlying principle.

Quote from: wforwumbo on January 09, 2018, 01:41:09 PM

For a simple example, let's say I had a system that wanted to add two numbers. The numbers are 1571 and 0448. The result for any human adding these two numbers together is 2019. The effect of a 16-bit system would be having the two numbers as 1500 and 0400, which adds up to 1900. SOME 16-bit systems can try and adjust this during recording as 1600 and 0400, which gives us 2000 and is a bit closer, but that requires some additional processing power on the A/D converter, most of which doesn't occur except in more expensive 16-bit recorders. If your converter is capable of 24-bit or 32-bit float and you're recording in 16-bit mode or manually converting to 16-bit in post, you're probably getting the former. A 24-bit system is the same as going to 1570 and 440, which adds up to 2010 - a lot closer than 1900.Your Samplitdue system, at 32-bit float and importing in 24-bit files, would be similar to having 1570.00 and 0440.00, which allows for decimal rounding at the end after processing - not of note during recording, but DEFINITELY useful in post, particularly pre-bit reduction and dither, as the equalizers can come MUCH MUCH closer to what the real values should be.

This is obviously an imperfect example, and a touch exaggerated. But it gets the point across.

morst · « **Reply #35 on:** September 25, 2020, 01:57:47 PM »

Fascinating, I would have guessed that the answer was almost never or never. Thanks for the detailed reply, explaining the use of quadratic / higher order interpolation.

Quote from: wforwumbo on September 24, 2020, 01:22:16 PM

Quote from: morst on September 24, 2020, 01:08:28 PM
Quote from: wforwumbo on September 22, 2020, 07:35:26 PM
Take a discrete sequence (in this case, the samples containing audio data). Now throw out every other sample; you have just simulated a downsampling rate of 2:1.

Is this how downsampling is implemented? I would have guessed that averaging two (or three?) samples would be the method, although it would take more CPU.

The answer is “sometimes”
Good downsampling algorithms that can recognize an integer ratio will do this. But more often than naught, they don’t.

aaronji · « **Reply #36 on:** September 25, 2020, 04:39:47 PM »

^^ Can you please elaborate on your example a little? Maybe I am being a little thick, but I don't understand how it applies to the recording itself. If the dynamic range of the source (plus headroom) "fits" into 16 bits, wouldn't that be sufficient? Can't both 1571 and 0448 be described exactly with 16 bits? I can see that performing operations on that data might push it outside of the 16-bit range, but, at least in the case of tapers, barely any of us are doing that during the recording process. When we import it into a DAW to process, the extra bits to handle operations will be available there.

I do understand the benefit of 24-bit recording and I can see the point for processing, but I don't get the reason why a less than 16-bit source would benefit from a longer word (assuming properly set gain).

[EDIT: Removed a typo.]

Gordon · « **Reply #37 on:** September 25, 2020, 05:27:57 PM »

Quote from: rippleish20 on September 14, 2020, 10:12:11 AM

I record and post at 24/48 ; it seems to me i nthis day and age people should be able to handle this...

I just started only releasing 24/48 about a year ago. Exactly one person has asked for a 16/44.1!

jerryfreak · « **Reply #38 on:** September 25, 2020, 07:47:40 PM »

Quote from: aaronji on September 25, 2020, 04:39:47 PM

^^ Can you please elaborate on your example a little? Maybe I am being a little thick, but I don't understand how it applies to the recording itself. If the dynamic range of the source (plus headroom) "fits" into 16 bits, wouldn't that be sufficient? Can't both 1571 and 0448 be described exactly with 16 bits? I can see that performing operations on that data might push it outside of the 16-bit range, but, at least in the case of tapers, barely any of us are doing that during the recording process. When we import it into a DAW to process, the extra bits to handle operations will be available there.

I do understand the benefit of 24-bit recording and I can see the point for processing, but I don't get the reason why a less than 16-bit source would benefit from a longer word (assuming properly set gain).

[EDIT: Removed a typo.]

that example is a bit exaggerated. 24 vs 16 bit is more like adding 1571.36453235439801090341 and 0448.59372906135484319835 vs 1571.364532354398 and 0448.593729061354, when in the end your required result cannot discern doesnt depend on much of anything past maybe the 9th or 10th decimal digit anyway

now an example of recording at very low levels (assuming no hardware limitations, 0000.0000 0000 0123 4567 8901 and 0000.0000 0000 0987 6543 2109 would look like 0000.0000 0000 0123 and 0000.0000 0000 0987 respectively in 16 bit.

if you add those and THEN normalize, in the 16 bit case you end up with 1110.00000000000000 vs 1111.111110100000

that difference would be more understandable as distortion - ie % of difference in data from input to output of device or process

aaronji · « **Reply #39 on:** September 27, 2020, 02:49:56 PM »

Maybe my last post wasn't clear enough, but I am not really interested in operations. What I am asking is this: "If the source is less than 16-bits and properly recorded to ISOs at 16-bits, is there any advantage to recording at a higher bit rate?" As I mentioned, I see the advantages of not having to ride levels or for processing. Also, aren't PCM samples signed integers?

DSatz · « **Reply #40 on:** September 30, 2020, 11:54:15 PM »

aaronji, to your previous question, if a recording "fits" within 16 bits then by definition there is no benefit to be gained by extending it to 24 bits, as long as you don't alter the recording otherwise.

If, however, the recording really uses the full range of the available 16 bits (I think it happened to me only once ever, at a percussion ensemble concert), then if you wish to do certain kinds of processing on it, there might be a tiny, tiny, tiny (and almost certainly not audible) theoretical advantage to adding another bit or two. To make the difference even potentially audible, you would need to play back the recording at a level allowing you to hear those bottom bits, and then the highest-order bits would be about 100 dB louder, which is not gonna happen in a critical listening situation.

--You can view the digital samples as signed integers as you want, but then you end up with full scale = +32767 or -32768, which are rather arbitrary numbers if you're not a binary geek. To me it makes more sense to view the samples as binary fractions, i.e. analogous to decimal fractions such as 0.45 or -0.234, with the bit values based on the series 1/2, 1/4, 1/8, etc. -- then full scale can be reckoned either as +/- 1 or, if you prefer, +/- 1/2 so that the entire range is then 1 from peak to peak.

In either case, because of the particular binary notation involved (so-called "two's complement" arithmetic), the 0 sign bit for the value 0 places that value in the positive half of the range. So there is one possible extreme sample value in the negative direction that is one step farther than exists on the positive side. This is equivalent to saying that in integer arithmetic (say, for an eight-bit byte), the possible values are from -128 through 0 to +127.

Many years ago I worked with a computer system (the "Adage Graphics Terminal") that used "ones' complement" arithmetic; it was weird because it had both +0 and -0 that were logically distinct values, even though they were quantitatively equal. And its maximum positive and maximum negative values were identically far from 0. But that type of arithmetic was rarely used even then, and by convention PCM audio is always twos' complement.

aaronji · « **Reply #41 on:** October 01, 2020, 09:54:41 AM »

Quote from: DSatz on September 30, 2020, 11:54:15 PM

aaronji, to your previous question, if a recording "fits" within 16 bits then by definition there is no benefit to be gained by extending it to 24 bits, as long as you don't alter the recording otherwise.

If, however, the recording really uses the full range of the available 16 bits (I think it happened to me only once ever, at a percussion ensemble concert), then if you wish to do certain kinds of processing on it, there might be a tiny, tiny, tiny (and almost certainly not audible) theoretical advantage to adding another bit or two. To make the difference even potentially audible, you would need to play back the recording at a level allowing you to hear those bottom bits, and then the highest-order bits would be about 100 dB louder, which is not gonna happen in a critical listening situation.

Thank you, DSatz. That was my intuitive understanding and exactly the sort of answer for which I was looking. Lots of dubious "information" on this topic out there on the web...

With respect to the binary, binary fractions are fine with me. The difference with respect to signed integers is just the scale (as you mentioned). I was actually specifically referring to the example that was using numbers that can't be represented in binary at all, or, rather, only as infinitely long binary ~~numbers~~ fractions.

[EDIT: For clarity.]

morst · « **Reply #42 on:** October 01, 2020, 02:56:26 PM »

Quote from: DSatz on September 30, 2020, 11:54:15 PM

In either case, because of the particular binary notation involved (so-called "two's complement" arithmetic), the 0 sign bit for the value 0 places that value in the positive half of the range. So there is one possible extreme sample value in the negative direction that is one step farther than exists on the positive side. This is equivalent to saying that in integer arithmetic (say, for an eight-bit byte), the possible values are from -128 through 0 to +127.

In order to have a "zero crossing" in a binary system, there is a necessary asymmetry!?
This is obviously true but it makes my brainzzzz explode!

PS the Sound Summit day one is complete and will continue to be available at this link https://www.youtube.com/watch?v=E1iUbxfJmQQ

328 · « **Reply #43 on:** October 03, 2020, 01:30:51 AM »

When intent is only CD, I record at 24/88.2
These days, with media so inexpensive, I record at 24/192, and archive the master.
With a high initial sampling rate, I'm totally comfortable with a resample to 44.1.
24/96 --> 16/44.1 (or 20/44.1) has been industry standard for decades.

Worthy of mention, DAT 16/48 was responsible for at least a few fast tapes in the 80's.
I'm not sure how the consumer decks did it, but the 48kHz rate was cited as the reason in a few DATHeads posts.

morst · « **Reply #44 on:** October 03, 2020, 10:49:19 AM »

Quote from: 108Ω on October 03, 2020, 01:30:51 AM

When intent is only CD, I record at 24/88.2
These days, with media so inexpensive, I record at 24/192, and archive the master.
With a high initial sampling rate, I'm totally comfortable with a resample to 44.1.
24/96 --> 16/44.1 (or 20/44.1) has been industry standard for decades.

It ain't about the rate, it's about a whole number ratio being completely different and much simpler math
Since you keep your masters you can always redo everything later.

Edit on Mon oct 5:
YOW!
Looks like my statement, now highlighted in red, is an incorrect assumption on my part!
See DSatz post on the next page.

Author Topic: 48 kHz vs 44.1 kHz sample rate (Read 107539 times)

Gutbucket

Re: 48 KHz vs 44.1 KHz sample rate

wforwumbo

Re: 48 KHz vs 44.1 KHz sample rate

morst

Re: 48 KHz vs 44.1 KHz sample rate

wforwumbo

Re: 48 KHz vs 44.1 KHz sample rate

wforwumbo

Re: 48 kHz vs 44.1 kHz sample rate

morst

Re: 48 KHz vs 44.1 KHz sample rate

aaronji

Re: 48 kHz vs 44.1 kHz sample rate

Gordon

Re: 48 KHz vs 44.1 KHz sample rate

jerryfreak

Re: 48 kHz vs 44.1 kHz sample rate

aaronji

Re: 48 kHz vs 44.1 kHz sample rate

DSatz

Re: 48 kHz vs 44.1 kHz sample rate

aaronji

Re: 48 kHz vs 44.1 kHz sample rate

morst

Re: 48 kHz vs 44.1 kHz sample rate

328

Re: 48 kHz vs 44.1 kHz sample rate

morst

Re: 48 kHz vs 44.1 kHz sample rate