Author Topic: Dithering and mixdown to 44.1 from 48kHz - why not just record in 44.1? (Read 15469 times)

morst · « **Reply #30 on:** January 29, 2011, 05:21:33 AM »

Quote from: taperj on January 27, 2011, 07:24:57 PM

Could you elaborate on this? I guess I'm confused as to why does the Sound Devices manual, wikipedia, and all other documentation I can find say it's truncation then? Just wanting clarification since most trusted sources say otherwise.

I guess they're just not being fully pedantic!

Lil Kim Jong-Il · « **Reply #31 on:** January 29, 2011, 11:49:58 AM »

Quote from: taperj on January 27, 2011, 07:24:57 PM

Quote from: Lil' Kim Jong-Il on January 18, 2011, 10:22:18 AM
If I may be pedantic, it's linear scaling. Truncation of the samples would result in a discontinuous waveform.

Could you elaborate on this? I guess I'm confused as to why does the Sound Devices manual, wikipedia, and all other documentation I can find say it's truncation then? Just wanting clarification since most trusted sources say otherwise.

As was pointed out before, I was being a smartass.

Truncation lops off precision. If you have have a 24-bit ADC and save the output as a 16-bit value by eliminating the lower 8 bits you lose precision but you still get a suitable approximation of the input waveform because all of the samples are shifted by the same proportion to fit within the range of values permitted by the size of the sample word. Lopping off the lower 8 bits scales the sample's representative value by 1/256. It's still linear scaling, it's just sort of brute force way of doing it that results in the peak value of the result having the same proportion to the maximum possible value of it's range as the original had to it's range.

In post-production, the conversion 24-bit to 16-bit for CD usually involves normalization prior to truncation so that the final values make almost complete use of the range of values available to represent a sample. Normalization linearly scales the sample value (usually up) and then the truncation scales the value down. The result is linearly scaled by a factor that is not simply a power of 2 as it would be with byte truncation only.

The ADC in the field simply truncates because it can't anticipate future input values, something that is required to calculate a normalizing factor. Without bothering to look at the SD manual, I'm pretty sure that is what SD is referring to. You can choose to simply truncate your 24-bit recording to get 16-bit samples in post. If your recording has peaks well below the maximum allowed sample value, then you end up with a result that is a less desirable approximation of the input than if normalized first. Both are linear scaling so I was just being a smart ass.

taperj · « **Reply #32 on:** January 29, 2011, 12:37:42 PM »

Awesome, thanks for the detailed explanation Lil' Kim Jong-Il, I was just curious. Always trying to learn everything I can about the in's and out's of audio. Much appreciated.

J

ghellquist · « **Reply #33 on:** January 30, 2011, 07:30:16 AM »

I have found that there is a much better way to think of sampled signals than how it is generally presented.

The thinking starts with defining that the signal goes between -1 and +1. Not quite up to one but very close. The signal swings between positive and negative values, up and down with the sound making the music. Not quite reaching 1.0 or -1.0 as that would be where the AD starts clipping.

One sample could then be represented as, exampel, the value 0.123 . In another system the same value could be represented as 0.1234334, clearly a much more accurate description of the value.

So if we simply chop off the last few decimals we can go from 0.1234334 to 0.123, losing precision in describing the sample value. This is how truncation works on the sample level.

But, our ears do not listen to samples, our ears hears the resulting sound vawes. And then things change a bit. The ear never hears one sample by itself, it can only hear the sound created by several samples, maybe even hundreds.

Our ears hears the truncation as two different things.
One effect our ears hears is an increase in the background noise level. This can be expressed as a diminishing of the SN-ratio ( read as Signal to Noise ) .
One way of looking at it, is that below the third decimal ( below 0.001 ) there is only noise.
Another effect our ears hears is the addition of some very low level sound artifacts. Mostly these are masked by all the other sound but sometimes they can be heard.

Notice that this truncation does not involve any scaling or other stuff, it is simply removing decimals from the representation of each sample.

What we want to do with sound samples instead of simply truncation ( = cutting off ) the decimals is to do a process called dithering. The idea is to add a small, specially formed, noise signal before cutting of the decimals in the representation. This special noise signal is designed to make the sound better for the ear. There are several different kinds of dithering signals, suffice to say that they all work about the same way but sound slightly different.

The effect of the dithering is to make the sound more pleasing to the ear, in two different ways.
First the noise floor the ear perceives is moved down a bit. And the ear can actually start hearing things a small way down into the noise floor.
The second effect is that some of the artifacts we hear when truncating are not heard anymore.

// Gunnar

ghellquist · « **Reply #34 on:** January 30, 2011, 07:39:21 AM »

This is a complicated thing to get full grips on. I know from personal experience ( from a slightly different but related field) that anything but a fully formal mathematical treatment of sampling will distort aspects of the full process. Not that I know how to it, I have only read papers written by people actually knowing the stuff and half-understood some of it.

I belive however that we should avoid some of the holes in trying to visualize things on a "popular science level".

Quote from: Lil' Kim Jong-Il on January 29, 2011, 11:49:58 AM

If you have have a 24-bit ADC and save the output as a 16-bit value by eliminating the lower 8 bits you lose precision but you still get a suitable approximation of the input waveform because all of the samples are shifted by the same proportion to fit within the range of values permitted by the size of the sample word. Lopping off the lower 8 bits scales the sample's representative value by 1/256. It's still linear scaling, it's just sort of brute force way of doing it that results in the peak value of the result having the same proportion to the maximum possible value of it's range as the original had to it's range.

There is no need to introduce any aspect of scaling in this popular science level of describing it. It is much better to work with the assumption that a 16 or 24 represention of a sound sample has a value range between -1 and +1. One sample could then have a value of 0.123 in a sixteen bit reprentation and 0.12343342 when represented by more bits. The number of bits then simply describe the degree of details, the precision, each sample can be described with.

This representation is actually how it is programmed inside just about any modern sound handling program. Samples are inside the program handled as "floating point".

// Gunnar

Lil Kim Jong-Il · « **Reply #35 on:** January 31, 2011, 01:32:57 PM »

Gunnar, my labeling of the 24-bit to 16-bit conversion by truncation as linear scaling is really very straight forward. Nothing about it is complicated nor a "popular science" view. You suggest additional concerns that were not part of my post and those do result in a different situation which is why I excluded them. The thread has wandered off topic so I'll send my lengthy and good natured follow-up by PM. If anyone else is interested, let me know and I'll copy you.

ghellquist · « **Reply #36 on:** January 31, 2011, 04:35:02 PM »

Hi.
The forum software cannot return any answer to the adress Lil' Kim Jong-Il.

I agree however that the discussion has gone off topic. Suffice to say that we may both be correct, but not agreeing on how to describe the same thing. Not unheard of in the world.

// Gunnar

Author Topic: Dithering and mixdown to 44.1 from 48kHz - why not just record in 44.1? (Read 15469 times)

morst

Re: Dithering and mixdown to 44.1 from 48kHz - why not just record in 44.1?

Lil Kim Jong-Il

Re: Dithering and mixdown to 44.1 from 48kHz - why not just record in 44.1?

taperj

Re: Dithering and mixdown to 44.1 from 48kHz - why not just record in 44.1?

ghellquist

Re: Dithering and mixdown to 44.1 from 48kHz - why not just record in 44.1?

ghellquist

Re: Dithering and mixdown to 44.1 from 48kHz - why not just record in 44.1?

Lil Kim Jong-Il

Re: Dithering and mixdown to 44.1 from 48kHz - why not just record in 44.1?

ghellquist

Re: Dithering and mixdown to 44.1 from 48kHz - why not just record in 44.1?