Digital audio and misconceptions about hi-res audio, sample rates, and bitdepth

lm4der · Jul 17, 2016

I thought that it would be useful to clarify some things about "hi-res" audio. There seems to be a lot of misconceptions about how higher sample rates and bit depths actually work, and how they play into audio quality.

I know some of you guys have a deeper understanding of this stuff than me, so please don't hesitate to correct anything I get wrong here.

I would like to enter this topic by discussing one of the main misconceptions about hi-res audio, which is:

That higher sample rates mean higher resolution, like DVD vs BluRay.
Or, another variation of this is: Higher sample rates have more signal information, because 96 samples per second (or whatever) is more samples than 44.1 samples per second, so that means it is more accurate and can sound better.

To be clear, those are misconceptions.

I see this a lot. Many people are attracted to hi-res because of the flawed (although intuitive) assumption that signal information is lost when we sample it for the conversion to digital.

Note: (Edited for clarity)

1) For now I am only talking about the sample rate, not bit depth. Anything less than infinite bit depth does introduce error in the reconstructed signal. I'll talk briefly about bit depth at the end.
2) When discussing Shannon/Nyquist and reconstructing a signal "perfectly", that is only true in the mathematical domain. The actual real world implementation of a reconstruction filter is always an aproximation.

So mathematically speaking, it has been shown by the Nyquist-Shannon sampling theorem that you can perfectly reconstruct the original signal (music) from samples, as long as your signal is bandwidth limited at half the sampling rate. What does that mean? CD audio, aka the Redbook standard, chose a sampling rate of 44.1khz (samples per second). This means that you can perfectly reconstruct the original music signal as long as you limit the frequency bandwidth to half that, ie 22.05kz. (Limiting bandwidth is a fancy way of saying that we have to cut off (filter out) any music with frequencies above the limit frequency, 22.05khz for CD audio).

This is important to repeat - the original audio audio wave form can be perfectly reconstructed from CD audio if you low-pass filter it at 22.05khz. To reinforce this point, I am cutting and pasting a relevant section from the Wikipedia article on Nyquist-Shannon:

"Intuitively we expect that when one reduces a continuous function to a discrete sequence and interpolates back to a continuous function, the fidelity of the result depends on the density (or sample rate) of the original samples. The sampling theorem introduces the concept of a sample rate that is sufficient for perfect fidelity for the class of functions that are bandlimited to a given bandwidth, such that no actual information is lost in the sampling process. It expresses the sufficient sample rate in terms of the bandwidth for the class of functions. The theorem also leads to a formula for perfectly reconstructing the original continuous-time function from the samples."

So, finally, the point about a bandwidth limited signal is that if you don't do that, ie if you don't filter out the signal above the bandwidth limit (ie 22.05khz for the Redbook example), when you reconstruct the wave, you get what are called aliases, which are like reflections of the audio signal from above the bandwidth limit into the audible range. We can hear that stuff, so we do need to filter it out as best we can. (BTW, this type of aliasing is not the same thing as what we think of when we talk about graphics cards that do anti-aliasing. That's more about interpolating pixel shades to accommodate the fact that a display has discrete pixels, so diagonal lines look jaggy).

Humans don't hear above 20khz, so the Redbook standard makes sense. It sets the sample rate just above what it needs to be, to be able to capture the range of human hearing. So, other than talking about bit depth, we're done, right? Our sampled CD quality music can be converted back to analog perfectly.

And that would be true... The only wrench in this plan is that it turns out that it is not so easy to do the do bandwidth limiting perfectly with such tight requirements as those imposed by the choice of the 44.1khz sampling rate - we want to keep everything up to the human hearing limit of 20khz, but then anything above 22.05khz must be filtered out. This makes for a very steep filter (called a brickwall low-pass filter), because there is only a small ~2Khz amount of space from the frequency where we want to hear up to (20khz), to where we need to cut the signal off (22.05kz). It turns out that building really steep filters of this nature introduces its own problems that distort the signal and therefore affect audio quality.

If it weren't hard to build the steep filter we would be done. Unicorns and butterflies, audio nirvana. (Well, you still need a very accurate converter to get those analog voltages out of the digital domain, but that's a different problem, not related to sample rates or bit depth). So, to try to avoid having to build such a steep filter, the idea of upsampling, or over sampling (nearly synonymous), was introduced. Upsampling DACs are designed such that they upsample the 44.1khz signal to a higher sample rate. This is done by inserting extra samples in between each existing sample. These extra samples have to be computed to match the shape of the real waveform - this is called interpolation, and is generally handled by an iterative algorithm that successively refines the computed approximation of the waveform. The more iterations the algorithm performs, the more accurate the interpolated samples will be. The interpolation functions converge to perfectly recreating the waveform at infinity (infinite number of iterations). These iterations are often called "taps" in a hardware implementation (so, Yggdrasil/Gungnir Multibit have 18,000 taps or iterations, BifrostMB 9,000).

If you upsample the 44.1khz signal to double the sample rate, 88.2khz, now your filter needs to only be at 44.1khz (at half 88.2) for Shannon-Nyquist perfection. And since we know that we actually only care about frequencies up to 20kz, now we have lots of room - we can use a low-pass filter with a gentle downward slope between ~20khz to ~44.1kz, an easy filter to make that doesn't distort things.

This solves the problem of making nice easy low-pass filters, but introduces a new problem - the interpolated samples are not a prefect match to the real waveform, because they have to be approximated. So now you have tons of work around designing these upsampling interpolation algorithms (that are also called "filters", godammit). All of the "DAC in a chip" DACs have to do these computations using limited silicon. Higher end stuff may have a dedicated cpu for the interpolation - for example Mike Moffat (@baldr) of Schiit worked together with some other impressive academics to create their own unique "closed form" filter that runs in a DSP chip. ("Closed form", as far as I understand it, which may be wrong, is a mathematical term that means that despite upsampling the original wave into a new wave that is an approximation of the original, they can still recover the original digital samples after their filter is done, before moving on to the bit->voltage conversion stage.)

Thus, the only real weakness of the Redbook 44.1khz sample rate is that it is so tight to the frequency limit we care about (20khz) that it made it hard to build DACs, due to the steep filter requirement. If the standard had been 88.2khz, or 96khz, we wouldn't have a more accurate digital representation of our music. The wave form in either case can theoretically come out perfect - but it would have benefited us by making the job of building the low pass filter easy, and thus eliminated the need for tricks like oversampling.

Briefly, on bit-depth: In short, bit depth translates into two things - the maximum dynamic range that can be expressed numerically, and noise from the quantization error of the discrete digital bit quantities. 16 bits gives 96db in dynamic range and the quantization noise floor is in theory inaudible... but this is arguable. The higher the bit-depth the more dynamic range, and the lower the noise floor. Technically, CD audio's 16 bit depth is pretty good - for example it exceeds the technical capabilities of vinyl by a wide margin. So, that said, increasing bit depth _does_ mathematically improve the audio signal, mostly by lowering noise floor, so there is some basis for arguing that greater bit depth is better.

Psalmanazar · Jul 16, 2016

If Sony and Philips wanted something higher than thant44.1khz (the highest sample rate compatible with NSTC and PAL VCR tapes), affordable digital audio wouldn't have happened until well into the 90s as DAT was only invented in 1987 and didn't replace PCM VCR adaptors until the nineties. then you have another five years or so for CDs to sound good as CDs and then more time for good DACs to trickle down from the pros to audiophiles to consumers.

schiit · Jul 16, 2016

Careful about "perfect reconstruction" claims. Shannon/Nyquist only applies if there is no quantization error. Or, in more understandable terms, when there's an infinite number of bits.

Huh?

With 16-bit audio, there are only 65,535 discrete levels that can be encoded when sampling. Samples can fall in-between levels, so the stored value is not exactly the same as the original value. This is known as "quantization error." No matter how perfect the filter, you will always have quantization error--this is baked in, permanently, to the digital recording. With quantization error as part of the equation, you can never have perfect reconstruction.

This is why 20/48 (as a digital audio format) would be much more meaningful than, say, 16/384. Encoding with 20 bits dramatically reduces the embedded quantization error of the recording, because 20 bits gives you over 1 million levels to choose from, rather than 65 thousand. Encoding at 16 bits, no matter the rate, does nothing to reduce the embedded error of the recording--all it does is make the filtering easier.

And yes, ideally, you'd want 20/96 or 24/96 to both decrease quantization error and ease the filter requirements...but if you have to choose one, choose more bits.

lm4der · Jul 16, 2016

schiit said: ↑

Careful about "perfect reconstruction" claims. Shannon/Nyquist only applies if there is no quantization error. Or, in more understandable terms, when there's an infinite number of bits.

Huh?

With 16-bit audio, there are only 65,535 discrete levels that can be encoded when sampling. Samples can fall in-between levels, so the stored value is not exactly the same as the original value. This is known as "quantization error." No matter how perfect the filter, you will always have quantization error--this is baked in, permanently, to the digital recording. With quantization error as part of the equation, you can never have perfect reconstruction.

This is why 20/48 (as a digital audio format) would be much more meaningful than, say, 16/384. Encoding with 20 bits dramatically reduces the embedded quantization error of the recording, because 20 bits gives you over 1 million levels to choose from, rather than 65 thousand. Encoding at 16 bits, no matter the rate, does nothing to reduce the embedded error of the recording--all it does is make the filtering easier.

And yes, ideally, you'd want 20/96 or 24/96 to both decrease quantization error and ease the filter requirements...but if you have to choose one, choose more bits.
Click to expand...

Yeah, well said. I sort of minimized the discussion of bit-depth and quantization error above, so thanks for expanding this.

Edit: The thing is, the quantization error is heard as noise, like tape hiss. The more bits, the lower the noise floor. So the real question is, how low do you want the noise floor...

schiit · Jul 16, 2016

Nope, quantization error is not heard as noise. It's an inherent error, not noise.

Noise floor is a different subject. There's no inherent reason a 16-bit DAC couldn't have a 120dB noise floor.

lm4der · Jul 16, 2016

schiit said: ↑

Nope, quantization error is not heard as noise. It's an inherent error, not noise.

Noise floor is a different subject. There's no inherent reason a 16-bit DAC couldn't have a 120dB noise floor.
Click to expand...

Very interesting, as I understand quantization error, especially if dithered, to indeed be random noise. But I am no expert on this, I may have that totally wrong.

I think this video is interesting, and discusses this idea - quantization error as noise. I don't know for certain that even this guy has it right:

http://www.xiph.org/video/vid2.shtml

schiit · Jul 16, 2016

Dither is noise. If you add noise at LSB levels, yes, you will have noise at LSB levels. However, if you don't add noise at LSB levels, you will not have noise at LSB levels. What is the noise floor of a 16-bit DAC outputting a recording of absolute silence into an output stage with -120dB SNR? It's -120dB.

I think the misconception comes from modern delta-sigma DACs, which cannot output absolute silence, since they are (a) dependent on the preceding and following samples to approximate the actual numerical output, and (b) typically using internal dither.

Again, quantization error is not noise, nor is it heard as noise--it is embedded error. It is something wrong with the recording.

lm4der · Jul 16, 2016

schiit said: ↑

Dither is noise. If you add noise at LSB levels, yes, you will have noise at LSB levels. However, if you don't add noise at LSB levels, you will not have noise at LSB levels. What is the noise floor of a 16-bit DAC outputting a recording of absolute silence into an output stage with -120dB SNR? It's -120dB.

I think the misconception comes from modern delta-sigma DACs, which cannot output absolute silence, since they are (a) dependent on the preceding and following samples to approximate the actual numerical output, and (b) typically using internal dither.

Again, quantization error is not noise, nor is it heard as noise--it is embedded error. It is something wrong with the recording.
Click to expand...

Don't get me wrong, you guys eat and breath this stuff, so help me understand.

This video is, in terms of my understanding of it, saying the opposite - that Nyquist/Shannon applies independent of bit-depth, and bit-depth related quant error only produces noise. The whole thing is worth watching, but the most relevant section starts at 8:41. http://www.xiph.org/video/vid2.shtml

If you could respond to this guy's version of things I would be indebted and would probably learn something!

schiit · Jul 16, 2016

If the video is making the claim that Nyquist applies independently of bit depth, it is wrong. Mathematically. Period. That's the response.

Rex Aeterna · Jul 16, 2016

i don't understand much stuff you guys talking bout but, just want to point out it's been shown in some studies that humans senses has the ability to detect things far above 20khz and below 20hz. even shown people with hearing lost while not able to hear 20khz was still able to use another form of sense to point out anomalies in the 20khz range and showed that actual acceptable low frequency limit of the avg human is around 10-15hz and not 20hz.

lm4der · Jul 16, 2016

Wait, how cool is this, I'm arguing with Jason Stoddard?! I shall capitulate to your extensive expertise, sir.

Man, I just love the fact that you engage with us on this site, and take the time to read and reply to stuff here. It is truly a generous thing, and I always value anything you add.

Cheers!

schiit · Jul 16, 2016

Yeah, but I'm the digital idiot around here. Arguing with Mike or Dave...that's a whole different ball game.

(And it's not really an argument, it's just, well, if the math is wrong, the math is wrong. I did get far enough into DSP to get the fundamentals. Or at least I thought I did.

slowsound · Jul 16, 2016

schiit said: ↑

If the video is making the claim that Nyquist applies independently of bit depth, it is wrong. Mathematically. Period. That's the response.
Click to expand...

I'd have to watch the video again, however my memory tells me that they don't mention that requirement at all. It goes to show bandwidth limited square waves, and noise floor demonstrations with and without dither, ignoring that limitation of Nyquist theorem.

ultrabike · Jul 16, 2016

lm4der said: ↑

So mathematically speaking, it has been shown by the Nyquist-Shannon sampling theorem that you can perfectly reconstruct the original signal (music) from samples, as long as your signal is bandwidth limited at half the sampling rate. What does that mean? CD audio, aka the Redbook standard, chose a sampling rate of 44.1khz (samples per second). This means that you can perfectly reconstruct the original music signal as long as you limit the frequency bandwidth to half that, ie 22.05kz. (Limiting bandwidth is a fancy way of saying that we have to cut off (filter out) any music with frequencies above the limit frequency, 22.05khz for CD audio).
Click to expand...

The perfect reconstruction filter is a sync function in the time domain. Such filter is impossible to create (because it's infinite in length and it extends from the beginning of time - i.e. not causal, to the end of time). One can only approximate it.

lm4der said: ↑

And that would be true... The only wrench in this plan is that it turns out that it is not so easy to do the do bandwidth limiting perfectly with such tight requirements as those imposed by the choice of the 44.1khz sampling rate - we want to keep everything up to the human hearing limit of 20khz, but then anything above 22.05khz must be filtered out. This makes for a very steep filter (called a brickwall low-pass filter), because there is only a small ~2Khz amount of space from the frequency where we want to hear up to (20khz), to where we need to cut the signal off (22.05kz). It turns out that building really steep filters of this nature introduces its own problems that distort the signal and therefore affect audio quality.
Click to expand...

That's not the only wrench to the plan. There are very powerful brick filters actually. They require a shit load of computations, but I believe Schit's awesome sauce filter is one such filter which approximates the sync function.

The problem is not necessarily that the filter is steep. It's that in order to create such filter one either goes the FIR route (like Schit and others) and use a fairly large number of computations, or use a more general class of filters (IIR) which can do steep cheaply but distort the phase.

lm4der said: ↑

So, to try to avoid having to build such a steep filter, the idea of upsampling, or over sampling (nearly synonymous), was introduced. Upsampling DACs are designed such that they upsample the 44.1khz signal to a higher sample rate. This is done by inserting extra samples in between each existing sample. These extra samples have to be computed to match the shape of the real waveform - this is called interpolation, and is generally handled by an iterative algorithm that successively refines the computed approximation of the waveform. The more iterations the algorithm performs, the more accurate the interpolated samples will be. The interpolation functions converge to perfectly recreating the waveform at infinity (infinite number of iterations). These iterations are often called "taps" in a hardware implementation (so, Yggdrasil/Gungnir Multibit have 18,000 taps or iterations, BifrostMB 9,000).
Click to expand...

Nope. Interpolation does not solve the problem at all. And again, Yggdrasil/Gungnir Multibit uses a steep FIR filter whose draw back is precisely that it's 18,000 taps long, and with a group delay of 9000 clock cycles, which may be acceptable in some applications.

lm4der said: ↑

If you upsample the 44.1khz signal to double the sample rate, 88.2khz, now your filter needs to only be at 44.1khz (at half 88.2) for Shannon-Nyquist perfection. And since we know that we actually only care about frequencies up to 20kz, now we have lots of room - we can use a low-pass filter with a gentle downward slope between ~20khz to ~44.1kz, an easy filter to make that doesn't distort things.
Click to expand...

There is no Shannon-Nyquist perfection in the real world AFAIK. Nor does it have to be that way unless one is anal retentive or so.

lm4der said: ↑

This solves the problem of making nice easy low-pass filters, but introduces a new problem - the interpolated samples are not a prefect match to the real waveform, because they have to be approximated. So now you have tons of work around designing these upsampling interpolation algorithms (that are also called "filters", godammit). All of the "DAC in a chip" DACs have to do these computations using limited silicon. Higher end stuff may have a dedicated cpu for the interpolation - for example Mike Moffat (@baldr) of Schiit worked together with some other impressive academics to create their own unique "closed form" filter that runs in a DSP chip. ("Closed form", as far as I understand it, which may be wrong, is a mathematical term that means that despite upsampling the original wave into a new wave that is an approximation of the original, they can still recover the original digital samples after their filter is done, before moving on to the bit->voltage conversion stage.)
Click to expand...

Clossed form or whatever, all such filters are approximations.

lm4der said: ↑

Thus, the only real weakness of the Redbook 44.1khz sample rate is that it is so tight to the frequency limit we care about (20khz) that it made it hard to build DACs, due to the steep filter requirement. If the standard had been 88.2khz, or 96khz, we wouldn't have a more accurate digital representation of our music. The wave form in either case can theoretically come out perfect - but it would have benefited us by making the job of building the low pass filter easy, and thus eliminated the need for tricks like oversampling.
Click to expand...

The waveform cannot come out perfect, because the perfect brick-wall filter does not exist at recording and reproduction time.

All of the above assuming infinite bit-depth.

lm4der said: ↑

Briefly, on bit-depth: In short, bit depth translates into two things - the maximum dynamic range that can be expressed numerically, and noise from the quantization error of the discrete digital bit quantities. 16 bits gives 96db in dynamic range and the quantization noise floor is in theory inaudible... but this is arguable. The higher the bit-depth the more dynamic range, and the lower the noise floor. Technically, CD audio's 16 bit depth is pretty good - for example it exceeds the technical capabilities of vinyl by a wide margin. So, that said, increasing bit depth _does_ mathematically improve the audio signal, mostly by lowering noise floor, so there is some basis for arguing that greater bit depth is better.
Click to expand...

That is arguable (the audibility part), but yes, quantization noise is less of a problem if sufficiently low.

As far as quantization...

Quantization error is noise, but it's not the same as thermal noise and it may have different statistics. It may be heard as white noise, depending. It may not be Gaussian.

slowsound · Jul 16, 2016

ultrabike said: ↑

Nope. Interpolation does not solve the problem at all. And again, Yggdrasil/Gungnir Multibit uses a steep FIR filter whose draw back is precisely that it's 18,000 taps long, and with a group delay of 9000 clock cycles, which may be acceptable in some applications.
Click to expand...

Sorry, does this mean a delay of ~204 ms?

Psalmanazar · Jul 16, 2016

@schiit I've never heard a recording ruined by quantization errors though; only poor micing, poor mixing, Led Zeppelin, black metal, heavy-handed mastering, and most of all: not having enough, if any good material when entering the studio and the ability to actually play it.

ultrabike · Jul 16, 2016

slowsound said: ↑

Sorry, does this mean a delay of ~204 ms?
Click to expand...

If the sampling rate is 96 kHz, then it is 9000 * 1/96000 = 94 ms. For some things this is not a problem. For some others it might. I dunno.

204 ms sounds right if the sampling rate is 44 kHz. But I would be surprised if Schit and other folks ran their FIR at that rate.

purr1n · Jul 16, 2016

Not acceptable for gaming. Or high quality feature film playback. Although you can always delay the video, and the projector will have some lag too.

ultrabike · Jul 16, 2016

Maybe. Movies may work. Can use lip sync worst case scenario.

lm4der · Jul 17, 2016

ultrabike said: ↑

The perfect reconstruction filter is a sync function in the time domain. Such filter is impossible to create (because it's infinite in length and it extends from the beginning of time - i.e. not causal, to the end of time). One can only approximate it.

That's not the only wrench to the plan. There are very powerful brick filters actually. They require a shit load of computations, but I believe Schit's awesome sauce filter is one such filter which approximates the sync function.

The problem is not necessarily that the filter is steep. It's that in order to create such filter one either goes the FIR route (like Schit and others) and use a fairly large number of computations, or use a more general class of filters (IIR) which can do steep cheaply but distort the phase.

Nope. Interpolation does not solve the problem at all. And again, Yggdrasil/Gungnir Multibit uses a steep FIR filter whose draw back is precisely that it's 18,000 taps long, and with a group delay of 9000 clock cycles, which may be acceptable in some applications.

There is no Shannon-Nyquist perfection in the real world AFAIK. Nor does it have to be that way unless one is anal retentive or so.

Clossed form or whatever, all such filters are approximations.

The waveform cannot come out perfect, because the perfect brick-wall filter does not exist at recording and reproduction time.

All of the above assuming infinite bit-depth.

That is arguable (the audibility part), but yes, quantization noise is less of a problem if sufficiently low.

As far as quantization...

Quantization error is noise, but it's not the same as thermal noise and it may have different statistics. It may be heard as white noise, depending. It may not be Gaussian.
Click to expand...

Thanks @ultrabike for jumping in. Your experience is a valuable learning aid.

A lot of feedback seems to be directed at my use of language around the Shannon/Nyquist theorem - I describe it is a providing for perfect signal reconstruction, but I should have also emphasized that perfect reconstruction is in the theoretical sense, and that in the real world we can't actually run the reconstruction filter (sinc function) on through eternity, so in fact we always have an approximation.

I will update that first post with some of this verbage.

As per bit depth, that obviously lends another element of inaccuracy to the reconstructed signal. There seems to be a bit of controversy around what the effect of quantization error is (ach hum noise ach hum), at least in my mind.

But as you said, we don't need to "perfectly" reconstruct the wave, as long as our approximations are "good-enough".

However, I wasn't trying to claim that we do ever achieve perfect reconstructuon, or that we need to. I was trying to point out that mathematically there is no benefit to higher sample rates, if your band limit is 20khz. There is no more music in there, despite more samples. Along the way I tried to explain some digital theory, but of course that's where it gets dicey for the amateur audiophool.

I appreciate the feedback, keep it coming! My ultimate goal was to provide a reasonably readable mainfesto about hi-res foolery, ie, the resolution misconception.

Digital audio and misconceptions about hi-res audio, sample rates, and bitdepth

lm4der A very good sport - Friend

Psalmanazar Most improved member; A+

schiit SchiitHead

lm4der A very good sport - Friend

schiit SchiitHead

lm4der A very good sport - Friend

schiit SchiitHead

lm4der A very good sport - Friend

schiit SchiitHead

Rex Aeterna Friend

lm4der A very good sport - Friend

schiit SchiitHead

slowsound Acquaintance

ultrabike Measurbator - Admin

slowsound Acquaintance

Psalmanazar Most improved member; A+

ultrabike Measurbator - Admin

purr1n Desire for betterer is endless.

ultrabike Measurbator - Admin

lm4der A very good sport - Friend

Share This Page

ABOUT US

RELATED LINKS

REFERENCES

CONTACT US

Digital audio and misconceptions about hi-res audio, sample rates, and bitdepth

lm4der A very good sport - Friend

Psalmanazar Most improved member; A+

schiit SchiitHead

lm4der A very good sport - Friend

schiit SchiitHead

lm4der A very good sport - Friend

schiit SchiitHead

lm4der A very good sport - Friend

schiit SchiitHead

Rex Aeterna Friend

lm4der A very good sport - Friend

schiit SchiitHead

slowsound Acquaintance

ultrabike Measurbator - Admin

slowsound Acquaintance

Psalmanazar Most improved member; A+

ultrabike Measurbator - Admin

purr1n Desire for betterer is endless.

ultrabike Measurbator - Admin

lm4der A very good sport - Friend

Share This Page

Useful Searches