Author Topic: 48 kHz vs 44.1 kHz sample rate (Read 27551 times)

heathen · « **Reply #60 on:** October 10, 2020, 08:32:43 AM »

This thread is like a free graduate school course in digital audio. Thanks all!

checht · « **Reply #61 on:** October 10, 2020, 04:56:09 PM »

Quote from: kuba e on October 08, 2020, 06:54:22 PM

...... What's good I learned here is that if I find some treasure in the archive, whoever would do quality digital post-processing in the future can resample my original with good a/d convertors. And the loss will not be so big.

Curious why you seem to prefer the dac -> adc path to convert vs upsampling in a DAW. Seems to me that there are more possibilities for problems going out of the digital domain and depending on perfect cables and connections in analog domain, and also more digital processes in that process, versus a singlue upsample process.

DSatz · « **Reply #62 on:** October 11, 2020, 01:11:06 AM »

Let's please keep in mind that audio sampling is just slicing a continuous signal into discrete ... samples. It doesn't include the subsequent conversion of those samples into numerical values; that's quantization. Back when sampling theory was first developed, quantization at speeds useful for audio wasn't readily available. But sampling itself is inherently analog. So maybe it can be simpler if people just think about sampling WITHOUT quantization just for the duration of this one message.

Like, you all know (or will know in a moment) that a capacitor can hold an electrical charge constant for some amount of time. If you take a capacitor and connect its two terminals to each other, any charge that it was holding will very soon be discharged, because the two "plates" of the capacitor will then be at the same level of charge as each other, and thus will have zero voltage (potential) relative to each other. (This is ignoring some strange low-level effects that happen in the real-world; let's say this is an ideal capacitor that wants to be included in thought experiments and has signed all kinds of release forms, so leakage and internal resistance and dielectric absorption and inductance are gonna sit this one out, OK?)

OK. So let's say that you get yourself a VERY large number of these ideal capacitors, and you set them all up in a row and tie one side of each to a common ground. They're sitting there like brand-new babies, just waiting for the world to give them information to store in their bellies. That information will be in the form of a certain level of charge relative to ground--but initially, they've all been discharged fully. Tabula rasa.

Now let's take a varying signal voltage that consists entirely of frequencies below, say, 1 kHz. Shannon's sampling theorem says that if you connect the signal at low impedance to each capacitor in turn, switching (noiselessly and in zero time--good luck) from one capacitor to the next every 0.5 msec, then after (say) 10 seconds you will have charged 20,000 capacitors in sequence, each according to the amplitude of the signal during its special little interval* of time. And since these are all perfect capacitors that volunteered to be part of this test, they can all hold their little breaths for as long as you want. Go get a sandwich or something.

Then you can come back and do the same thing in reverse: Discharge each capacitor in turn, spending exactly 0.5 msec on each one, into an amplifier that has a reasonably high input impedance. You'll get switching noise and several other kinds of crap above 1 kHz, but below 1 kHz where the original signal was, you should get a perfect** reconstruction of it--which you can recover for practical use by filtering at 1 kHz again. Since you can never do that perfectly in the real world, that's why there's a somewhat substantial guard band (that's part of why the CD sampling rate is 44.1 kHz rather than 40.0000000000000000000000000001 kHz).

In this process, time got divided into discrete intervals, but the amplitude of the signal remained continuous; no quantization or digitization occurred in the making of this thought experiment, which was supervised by the local chapter of the American Society for the Prevention of Stairstep-levels in Audio (ASPSA).

And that type of scenario is what the sampling theorem was originally about, and what the discussion of sampling rates is based on. Not digitization, which is a further dimension of processing that depends on sampling.

--best regards

______________________
* note "interval of time", not "moment in time"--that's key to one of the great misconceptions about digital audio, but I'll save that for another message some day.
** "perfect" here means: You can achieve any level of quality that you're willing to pay for; the process of sampling as such won't limit you--though available real-world electrical components, etc., surely will.

aaronji · « **Reply #63 on:** October 11, 2020, 08:54:05 AM »

Quote from: DSatz on October 11, 2020, 01:11:06 AM

In this process, time got divided into discrete intervals, but the amplitude of the signal remained continuous; no quantization or digitization occurred in the making of this thought experiment, which was supervised by the local chapter of the American Society for the Prevention of Stairstep-levels in Audio (ASPSA).

Where can I sign up???

Just kidding. Thanks for the interesting post...

kuba e · « **Reply #64 on:** October 11, 2020, 05:35:49 PM »

I read DSatz's post. I disobeyed, I didn't go to get a sandwich, but I kept reading. It was a mistake, it took me a while to deal with the capacitors (and their full bellies). I finally understood, thanks a lot!

Checht, if I understood Wforwumbo well, accurate upsampling the signal in digital domain (in DAW) is not an easy task. Simulating dac and adc in DAW is probably difficult. But it is possible, see Wforwumbo's post. Professionals can probably upsample in DAW. For me, I think, it would be easier and safer to use dac and adc.

Quote from: wforwumbo on October 08, 2020, 05:35:55 PM

Quote from: kuba e on October 08, 2020, 05:23:38 PM
Thank you very much Wforwumbo for nice explanation. I understand.

My question about resampling in DAW was meant with regard to - If someone has a good DAW with good resampling algorithm, can this replace d/a and a/d physical converters to resample the recording from original 44.1k to 96k due to fine digital post-processing?

Short answer is "no", the less succinct answer is "with lots of caveats, it's possible", and the best answer is "just record at 96k in the first place to avoid such a headache."

There are tricks we can play in terms of "modeling the transfer from digital back to the analog domain" with upsampling, but our tricks can only get us so far. There are some standards that try to model this (such code is used to accurately display dB meters in DAWs, for example), but because of the shortcomings of resampling algorithms they're already inherently a bit flawed.

Edit: If your question is more about understanding the philosophy behind sample rates and resampling, then yes your intuition is barking up the right tree here. But in practice... just record at the higher sample rate.

kuba e · « **Reply #65 on:** October 11, 2020, 07:22:43 PM »

I would also like to ask about quantization. These are only theoretical questions. I have recorded in 24bit. And I used to convert it to 16bit. This was due to space and also due easier file handling. I normalized master recordings to 0 dB FS and then I converted it to 16bit. Now, out of curiosity, I am thinking about it in more details:

16 bits are 96db dynamic range. A quantization noise is a half of the last bit, so it is 3dB. This is less than microphone or background noise. When the master recordings does not exceed a dynamic range 96dB and after a level normalization, can I always save the master recordings to 16bit without degrading it?

What happens when the recording has a dynamic range greater than 96dB and I save it in 16bit? Will the quantization noise increase? Eg dynamic range 108db (96db+12db), quantization noise is 12db?

Am I thinking about it right?

checht · « **Reply #66 on:** October 11, 2020, 10:05:40 PM »

Quote from: kuba e on October 11, 2020, 05:35:49 PM

Checht, if I understood Wforwumbo well, accurate upsampling the signal in digital domain (in DAW) is not an easy task. Simulating dac and adc in DAW is probably difficult. But it is possible, see Wforwumbo's post. Professionals can probably upsample in DAW. For me, I think, it would be easier and safer to use dac and adc.

Kuba, thanks for your reply, and your great questions.

DSatz · « **Reply #67 on:** October 11, 2020, 11:56:40 PM »

kuba e, when you convert a 24-bit recording to 16 bits, your software should definitely redither at the 16-bit level prior to truncation; otherwise the truncation process would add quantization noise. It's highly improbable that you would actually hear that noise, but not impossible; certainly you can create situations in which the noise is audible if you raise the playback gain far enough, and it's not at all a nice kind of noise, since its level is highly program-dependent (i.e. it sounds like analog tape modulation noise that's having a particularly bad day). It's not too "expensive" (in terms of dynamic range lost, CPU time, or anything else) to use dither to ensure that no quantization noise whatsoever is added--so fortunately, that has become standard practice.

The exact numbers for the maximum theoretical dynamic range of a 16-bit recording, and the minimum noise increase for adequate dither, aren't exactly 96 and 3 dB for technical/math reasons, but those are within a dB or two of the right numbers. The discrepancy is partly because a factor of 2 isn't exactly 6 dB but rather 20 * log₁₀ 2 = 6.0206... dB, and partly because the greatest undistorted amplitude that can be represented is the rms value of a sinusoid whose peaks just reach full scale, rather than a sinusoid whose rms amplitude itself is at full scale. If I recall correctly the actual value is 97.3 dB maybe, while the minimum lost dynamic range due to dither can be closer to 2 dB if you're clever about its characteristics ("probability density"), but it's been years since I saw the math spelled out completely.

But those are theoretical values based on simplified models of both the signal and the noise, both in terms of the distribution across the frequency spectrum, and the relative amount of time that each one spends at each possible level within its range (i.e. its varying levels over time). A random noise will have occasional moments (which may last only one sampling interval, but still ...) of being significantly higher in level than its long-term average. The ear/brain does a certain amount of time-averaging, but is also sensitive to sudden shifts. As a result, rms values are rather poor indicators of what we hear in live situations, unless our life consists of listening to sine waves. This is why marketing people love to quote A-weighted, rms values of equivalent noise for microphones--typically those are 10 to 12 dB lower than the much more revealing CCIR-weighted quasi-peak values, even though the latter also contain some averaging (it's just a much more perceptually relevant, carefully tested, shorter-term form of averaging).

As a result, there can be real-world cases in which the noise floor of a recording system may be exposed in a given range of frequencies somewhat more readily than you might expect from the simplified/schematized ideal values alone. This is why I was a fan of Dolby "A"-type noise reduction even when I was recording live concerts at 15 ips on a well-maintained Nagra recorder, and eventually telcom c4 noise reduction, which is even more powerful. And it's one reason I get impatient with people who say that The Right Way is to set your levels so that the highest signal peaks are at -12 or even lower (I've seen someone say -16 to -18), and don't worry because "24 bits". That approach could work in a given case (i.e. not add any unnecessary audible noise to the recording) but it could very definitely fail (i.e. add unnecessary audible noise to the recording, possibly in quite significant amounts); without further details, it's impossible to know which.

--best regards

DSatz · « **Reply #68 on:** October 12, 2020, 12:40:11 AM »

kuba e, after I wrote my previous message about dynamic range I re-read your messages, and I think I should explain what is and isn't meant by quantization noise. You appear to be thinking in terms of a steady noise floor, which it isn't inherently.

Imagine a 16-bit A/D converter that's perfect (no noise, no non-linearity of any kind) except that it also has no dither. Say you feed it a steady, pure 400 Hz tone at a level so low that only the lowest-order three bits of the A/D toggle on and off in response. There are only eight possible signal levels available from three bits, so you definitely have the famous stairstep problem; at any given moment, there's likely to be a rather significant inaccuracy in the sample value relative to the (very low!) amplitude of the tone--but numerically, that inaccuracy is always less than the value of your lowest bit, so you're stuck with it.

Let's stop at this point and look more closely at that inaccuracy as if it were a signal all its own--an "error signal". Subtract the original, pure sinusoid from the stairstep result (or vice versa, doesn't matter which) and you'll get basically a patterned series of broken curves peaking just below the level of the lowest-order bit. I say patterned because it's not at all random; its shape and content will vary cyclically as the sinusoid's momentary amplitude passes through each of the eight available values that the system can represent exactly. So the error signal is a recurring (but very strange and "toothy") waveform at 400 Hz, plus it has rather strong overtones at integer multiples of 400 Hz. Thus you can think of it as noise or you can think of it as distortion. Neither view is wrong, nor do the two views have to exclude one another.

If you were to view the frequency content of this A/D's output on a spectrum analyzer, you would see a vertical line at 400 Hz, and further vertical lines at 800 Hz, 1200 Hz, 1600 Hz and so on, with their heights gradually decreasing toward higher frequencies, but continuing several octaves for sure. It's messy. What dither does is to take that error signal and randomize it. Not drown it out, since the dither is applied to the analog signal BEFORE quantization, so the patterned noise/distortion never has a chance to occur in the first place.

If you were to repeat the experiment and gradually inject dither in slowly increasing amounts prior to quantization, what you'd see on the spectrum analyzer is quite remarkable (I saw this done, and intellectually I knew what was going to happen, but it was still remarkable to experience it). At first you see what I described earlier--the forest of regularly-spaced vertical lines. As the dither creeps up in level, you can see a tiny, tiny turbulence that it causes in the noise floor--but the vertical lines above 400 Hz, which are well above that noise floor, start to get shorter and shorter while the 400 Hz line stays the same height. And when the optimal level of dither is reached, all the other vertical lines have shrunken to where you can't see them any more. It's not that the noise floor has risen to their level--rather, the distortion components have decreased (a lot farther and a lot faster!) to below its level.

So quantization noise is also known as quantization distortion, and it basically is that "error signal". It will consist of benign noise when dither is properly applied, while it will be program-dependent noise (that calls huge attention to itself and sounds grainy/granular beyond what you would ever expect, given how low in level it is on an rms basis) when dither is absent or insufficient, etc.

--best regards

kuba e · « **Reply #69 on:** October 12, 2020, 08:55:07 AM »

Thank you DSatz. It's kind of you to explain all this clearly. Also thanks for the alert for the dither. I used to remember the lesson that we have to use the dither always when we change the bit's depth. But I forgot about it. I thought that my DAW (Reaper) internally calculate in 32bit and dither automatically. Looks like no, I have to check it.

Quote from: DSatz on October 12, 2020, 12:40:11 AM

Let's stop at this point and look more closely at that inaccuracy as if it were a signal all its own--an "error signal". Subtract the original, pure sinusoid from the stairstep result (or vice versa, doesn't matter which) and you'll get basically a patterned series of broken curves peaking just below the level of the lowest-order bit. I say patterned because it's not at all random; its shape and content will vary cyclically as the sinusoid's momentary amplitude passes through each of the eight available values that the system can represent exactly. So the error signal is a recurring (but very strange and "toothy") waveform at 400 Hz, plus it has rather strong overtones at integer multiples of 400 Hz. Thus you can think of it as noise or you can think of it as distortion. Neither view is wrong, nor do the two views have to exclude one another.

I like this sentence. I would slightly improve it:
Neither view is wrong, neither view is right, nor do the two views have to exclude one another, nor do the two views have to complement one another.
I found pictures of what DSatz describes: http://www.skillbank.co.uk/SignalConversion/quanterror.htm

I record music in pubs or clubs. There it is possible to hide many of my mistakes behind not ideal sound, background noise, etc. Those who record in a strict environment, for them, these mistakes are much bigger than for me. But the theory is also important to me because when I am recording then it is more interesting. And everyone records sometimes in a critical environment.

Heathen, you are right. Also for me, this thread is like a free graduate school course in digital audio.

wforwumbo · « **Reply #70 on:** October 12, 2020, 02:38:59 PM »

Quote from: aaronji on October 08, 2020, 04:55:13 PM

Quote from: wforwumbo on October 08, 2020, 12:02:41 PM
Digital signals DO have data outside of that bandwidth of 0-20 kHz. They actually have a lot of information outside of that range defined from -infinty to +infinity, and even more technically the baseband has info from -20k to +20k (don’t think too hard about that one...) but this is another conversation for another day.

Would you be so kind as to explain this in greater detail? If not here then in a new thread? I can understand why this would be the case in the mathematical realm, but I can't wrap my head around the idea of negative Hz in the physical realm. It seems, basically, impossible to me.

So, I wanted to pick back up on this since I never got around to answering it.

I spent a few days drawing up proofs and theorems, and realized most of the math would likely confuse more than it would assist. So I want to try simplifying the concept a bit more, before going "down the mathematical rabbit hole" - which I am happy to do, but want to use as a last resort here.

When you look at a frequency response plot, you are actually looking at the magnitude frequency response plot. This is because the way we "remap" data from the time domain to the frequency domain is through the Fourier transform. The Fourier operator is very useful, and the mapping variable for that is actually a complex (as in, sqrt(-1) complex/imaginary) exponential. The Fourier transform of a signal looks like:

$F(\omega) = \int_{- \infty}^{\infty} f(t) e^{- j \omega t} \delta t$

where F(w) is the frequency content, f(t) your test function, j is the sqrt(-1) quick historical side note: mathematicians use i for sqrt(-1), electrical engineers already use i for current so j gets the sqrt(-1) honors in our equations w or omega is the angular frequency (2 * pi * f), t is time, and dt is the differential operator with respect to time. NOW... what this means, is that you are actively re-mapping the data using that exponential construct. When you are looking at a frequency response plot (for example, from a manufacturer) you are often looking at the absolute value of this function, and it is frequently limited from 20 Hz - 20 kHz, since looking at data outside of that is rarely relevant.

BUT... what happens when we evaluate the magnitude spectrum outside of that range? Well, the negative frequency range can be though of as an expression or manifestation of the negative phase of waves. I don't say they are explicitly representative of both positive and negative phases of the waves, since that has implications for power analysis (which is another topic for another time...). But that is part of how you can wrap your head around what the negative frequency data is getting at - when you take the absolute value of a part-real, part-imaginary number, you get the magnitude by using pythagorean theorem on the real and complex parts to get the "total magnitude" effect of a system on the magnitude-frequency spectrum.

THAT is often what we are looking at.

Phase response is something that is harder to explain without a bit of experience, I'll leave it off the table and comment for now that when I'm doing engineering work about 75% of the time the real answer to debugging my systems comes from the phase response and not the magnitude response. But it's something that hurts to wrap your head around, especially if you're not familiar and comfortable with imaginary numbers.

I don't want to distract too much from the topic at-hand, but as a fun thought experiment, what happens mathematically speaking if we remove the negative frequency domain information, so that our signal is exclusively 20 Hz - 20 kHz, and NOTHING else? Or, what if it's 0 Hz - 20 kHz? Well to construct a time-domain signal for which the above Fourier transform holds true, you actually need an infinitely long signal stretching out to before "t = 0" - which means you have a non-causal system that requires knowledge of events before measurement as well as theoretically infinite energy. These signals do exist, but they often are not represented by simple frequency domain analysis - you need to expand to the aforementioned power analysis to get an idea of what's up. I will leave that to another topic.

Now... everything I have mentioned thus far is for analysis of a finite, causal (meaning, no info is processed before we "hit the record button"), band-limited, analog signal. We can think of the magnitude spectrum of our ideal audio signal plotted in frequency as a rectangle - flat magnitude with unit 1 between -20 kHz and +20 kHz, and zero outside of that.

Sampling has this interesting property in the frequency domain in that it copies that magnitude spectrum and pastes it infinitely in either direction, centered around the sample rate. So for a quick thought experiment, let's sample the signal at 50 kHz. The bandwidth of our signal is 40 kHz, so for all real integer n * 50 kHz we see the frequency spectrum pasted. This means we have our nice rectangle from -20 to + 20 k, centered at 0 Hz, pasted and showing up centered at 50 khz from +30 to +70 kHz, centered at 100 kHz from +80 to +120 kHz, etc. going in the positive direction, and pasted again going in the negative direction centered at -50 kHz and going from -70 to -30 kHz, centered at -100 kHz going from -120 to -80 kHz, and so on.

A semi-reasonable graphical interpretation looks like this, except imagine the content is a flat rectangle (it's drawn as-such for other mathematical reasons not worth getting into right now...):

Sampling theory in the frequency domain insures that the negative data from a copy does not interfere with the upper-end of the positive spectra of our original baseband signal (so that the first copy that starts at +30 kHz does not interfere with our data going up to +20 kHz). IF we were to lower our sample rate to, say, 35 kHz, then the data from +15 to +20 kHz in our original signal gets distorted by the copy in the frequency domain that is a result of sampling.

Another good illustration that google is able to provide quickly:

BUT WUMBO! Why then do we sample at 44.1 kHz? Why that extra 4.1 kHz?

Well, this is all still theoretical. Again, our digital signal theoretically has information above 20 kHz and below -20 kHz; when it gets converted back to the analog domain, that info is still there. To prevent that energy from sending signals at frequency ranges outside of what our equipment is rated for, we generally low-pass filter the data so that back in the analog domain we ONLY have data from -20 kHz to +20 kHz. We cannot do that in the digital domain, ONLY the analog domain. And real-world filters can be pretty sharp and steep, but they have implications - especially in the phase spectrum. To insure minimal distortion, that extra 4.1 kHz of sampling is a bit of a buffer to assume audio equipment may not have the sharpest filters.

I know I threw a lot of conceptual and cerebral info out here, and I know that some of it may not be thoroughly explained or concretely approachable. Feel free to ask follow-ups, this stuff takes everyone a while to fully wrap their heads around.

aaronji · « **Reply #71 on:** October 13, 2020, 07:26:58 AM »

^ Thanks for this post! I have some knowledge of Fourier transforms, so it even (kind of) makes sense. I will have to read it again, and think about if for a minute, but this is what I was getting at when I said I could only conceive of it mathematically and not physically.

I also appreciated the note on imaginary numbers. I have always seen, and been taught, that i is the square root of -1. It's interesting to see how the same thing often has different notation in different fields...

kuba e · « **Reply #72 on:** October 14, 2020, 05:16:21 AM »

It is very nice explanation! It's nice when someone can simply explain in words a theory that is based on complex mathematics. These are explanations of the details, but when I create at least a rough idea of these details and what is happening when sampling and quantizating, it's useful. From a practical point of view, it may not be necessary, but it's good to know what I do and why I do it when I record. Then recording is much more interesting for me.

seethreepo · « **Reply #73 on:** October 14, 2020, 12:34:07 PM »

I kept up with page for the first few pages but I could now use a Sample rate for dummies version

wforwumbo · « **Reply #74 on:** October 14, 2020, 12:38:42 PM »

Another comment I want to make here:

DSatz is as expected 100% right on the money in all of his posts.

However, to avoid a VERY commonly confused misconception... stair stepping is not exactly a 100% accurate representation of the signal in the digital domain, nor is it necessarily a representation of the signal from the digital domain back into the analog domain. It's a useful visual tool, but does not inherently or necessarily exist in either the digital signal or the converted-to-analog signal.

The expansions of points in this video run slightly counter to the advice I tend to give, but it's a good explanation and overview of LOTS of concepts realized and visualized in DSP. I NEED to reinforce this here, for you to keep in the back of your mind while watching this video - ignoring any differences in the actual performance of a digital to analog converter, it is highly unlikely you would sonically hear an appreciable or noticeable difference between 16/44.1 and 24/96 (or higher) ON PLAYBACK for rock music. There are other reasons outlined earlier in this thread for recording at higher bit depths and sample rates that I encourage.

https://www.youtube.com/watch?v=cIQ9IXSUzuM

Author Topic: 48 kHz vs 44.1 kHz sample rate (Read 27551 times)

heathen

Re: 48 kHz vs 44.1 kHz sample rate

checht

Re: 48 kHz vs 44.1 kHz sample rate

DSatz

Re: 48 kHz vs 44.1 kHz sample rate

aaronji

Re: 48 kHz vs 44.1 kHz sample rate

kuba e

Re: 48 kHz vs 44.1 kHz sample rate

kuba e

Re: 48 kHz vs 44.1 kHz sample rate

checht

Re: 48 kHz vs 44.1 kHz sample rate

DSatz

Re: 48 kHz vs 44.1 kHz sample rate

DSatz

Re: 48 kHz vs 44.1 kHz sample rate

kuba e

Re: 48 kHz vs 44.1 kHz sample rate

wforwumbo

Re: 48 kHz vs 44.1 kHz sample rate

aaronji

Re: 48 kHz vs 44.1 kHz sample rate

kuba e

Re: 48 kHz vs 44.1 kHz sample rate

seethreepo

Re: 48 kHz vs 44.1 kHz sample rate

wforwumbo

Re: 48 kHz vs 44.1 kHz sample rate