http://sjeng.org/ftp/SACD.pdf
Audio Engineering Society
Convention Paper 5395
Presented at the 110th Convention
2001 May 1215 Amsterdam, The Netherlands
Why 1-Bit Sigma-Delta Conversion is Unsuitable
for High-Quality Applications
by
Stanley P. Lipshitz and John Vanderkooy
Audio Research Group, University of Waterloo
Waterloo, Ontario N2L 3G1, Canada
ABSTRACT
Single-stage, 1-bit sigma-delta converters are in principle imperfectible. We prove this fact. The reason, simply
stated, is that, when properly dithered, they are in constant overload. Prevention of overload allows only partial
dithering to be performed. The consequence is that distortion, limit cycles, instability, and noise modulation can
never be totally avoided. We demonstrate these effects, and using coherent averaging techniques, are able to display
the consequent profusion of nonlinear artefacts which are usually hidden in the noise floor. Recording, editing,
storage, or conversion systems using single-stage, 1-bit sigma-delta modulators, are thus inimical to audio of the
highest quality. In contrast, multi-bit sigma-delta converters, which output linear PCM code, are in principle
infinitely perfectible. (Here, multi-bit refers to at least two bits in the converter.) They can be properly dithered so
as to guarantee the absence of all distortion, limit cycles, and noise modulation. The audio industry is misguided if
it adopts 1-bit sigma-delta conversion as the basis for any high-quality processing, archiving, or distribution format
to replace multi-bit, linear PCM.
...
In light of the above, it is with alarm that we note the adoption of the
single-stage, 1-bit sigma-delta converter architecture as the encoding
standard for a next-generation (and supposedly higher-quality)
consumer digital audio format. We refer, of course, to the Direct
Stream Digital (DSD)1 encoding which forms the basis of the Super
Audio CD2 format introduced recently by Philips and Sony (see, for
example, [5] and [6]). The original intention to have the digital audio
data at every stage of the processing from the original analogueto-
digital conversion, through all the editing and mastering
operations stored in the DSD 1-bit format has apparently now
been abandoned. This was a wise decision. The conversion to the
final 1-bit DSD format, however, still represents a required, and quite
unnecessary, degradation of the quality of the audio signal. Every
single 1-bit data conversion entails an inevitable loss of signal quality
in a way which need not occur with multi-bit, linear PCM.
...
4. CONCLUDING REMARKS
Some final comments and speculations can be made:
MASH-type multi-stage converters, using multi-bit
quantization at the first-stage, are not subject to the same
criticism as the single-stage 1-bit sigma-delta converter,
provided that their quantizers do not overload.
The repeated 1-bit sigma-delta reconversions entailed by a
misguided desire to store the data in DSD format after each
intermediate processing stage, would result in the accumulation
of significantly greater noise and nonlinear artefacts than would
occur with any of the dithered multi-bit systems under
corresponding conditions. This is not a trivial matter, because
each signal processing operation (even a trivial one, such as a
gain change) results in the 1-bit DSD data stream turning into a
multi-bit data stream!
Because of the insoluble theoretical problems discussed in this
paper, we are unaware of any way to generate a Super Audio
CD test disc which is both distortion-free and has a constant,
signal-independent noise floor! In the multi-bit domain, this is
easily done using standard dithering methods. Indeed, the
measurement standards for PCM-based audio, developed by the
Audio Engineering Society, mandate the use of TPDF dither.
This is an impossibility for 1-bit digital audio.
The amount of negative feedback used in a 1-bit sigma-delta
modulator striving to straighten its quantizer transfer
characteristic, and simultaneously achieve a signal-to-noise
ratio of 120 dB, far exceeds anything ever used before in highquality
audio design. Ironically, while a part of the industry
mistakenly espouses low feedback for top quality, what we
have here is the exact opposite touted as being even better!
Since it is the high amount of negative feedback at low
frequencies that reduces the 1-bit distortion products to low
levels in the audio band, it is not unexpected that we find the
distortion products rising at high frequencies, where the
corrective negative feedback has actually turned into positive
feedback!
The high levels of ultrasonic noise and spuriae produced by an
inadequately-filtered 1-bit sigma-delta converter pose a problem for audio amplifiers and loudspeakers, which can
generate nonlinear distortion products in the baseband when
subjected to this type of indignity. One wonders how many of
the perceived differences noted in Super Audio CD listening
comparisons might be due to such nonlinear effects.
Just as it might be true that one can perceive ultrasonic signals
that are correlated with the baseband signal, so too might the
low-level, but correlated, distortion products that we have
shown to exist within the baseband be perceptible, even though
they would normally be thought to be below audibility. Further
research is needed here.
Since the Gerzon/Craven noise-shaping theorem implies
equal areas above and below the unshaped noise floor on a
logarithmic vertical axis, it follows that there will always be a
net increase in the total noise power as a result of noise
shaping, if the theoretical curve is adhered to. Since the total
output power of a 1-bit shaper is constant, it follows that, even
in the absence of signal, the noise PSD of a 1-bit noise shaper
cannot follow the theoretically-prescribed curve. Its total
output noise power must be less than the curve predicts. Then,
if an input signal is added, the noise power at the output must
drop even further, so causing further noise PSD changes.
As a general principle, it is undesirable to repeatedly noiseshape
a signal as it progresses through various processing
stages, since the total high-frequency noise power keeps
accumulating. This means that a signal should ideally always
be kept at a wordlength at least as long as that required to
preserve the signals baseband noise floor, irrespective of the
incoming wordlength. Thus a signal with 120 dB baseband
signal-to-noise ratio, should never be processed and stored with
less than a 24-bit wordlength all the way through to the final
digital-to-analogue conversion, whether it is a shaped 1-bit
DSD signal, a shaped 8-bit signal as envisaged in Example (c)
of Section 3, or an unshaped 20- or 24-bit signal. This
principle should apply to both consumer and professional
digital audio. Noise shaping should only be used when storage
or transmission limitations require data rates to be reduced.
(Parenthetically, it seems perverse of DSD to unnecessarily go
in the opposite direction by actually increasing the data rate!)
That the 1-bit converter is quite aberrant can also be gleaned
from the following arguments. The classical model for the
quantization error E of an undithered multi-bit quantizer
postulates that it has a power of Ä2/12. The error power of a 1-
bit quantizer is (Ä/2)2 = Ä2/4 under no-signal conditions three
times too large. This error power is, moreover, independent of
whether or not the 1-bit quantizer is dithered. [Interestingly, a
TPDF-dithered multi-bit quantizers error power is also Ä2/4 (=
3Ä2/12).] We have already noted (in Section 3) that the 1-bit
quantizers noise power must drop in the presence of any
input signal, since its total output power is absolutely constant,
thus unavoidably causing noise modulation. We now see that,
when outputting any signal, its error power must be less than
the Ä2/4 error power of a TPDF-dithered multi-bit quantizer. In
addition, its error power spectrum is only approximately white,
while that of the TPDF-dithered multi-bit quantizer is truly
white, as guaranteed by dither theory ([3], [4]). Hence, when
modulated, its noise spectrum would be expected, on average,
to lie below, and parallel to, that of the corresponding TPDFdithered
multi-bit quantizer. (Were the spectrum of E truly
white for the 1-bit quantizer, the two curves would be exactly
parallel, for the output error is just E as shaped by the linear
filter {1 . H}.) This is confirmed by what we found in Section
2, and can be clearly exhibited by overlaying the upper curves
of Figs 9 and 10. This is done in Fig. 17, where in each case
the input to the Lip7ZP shaper is two half-full-scale (i.e., Ä/8
amplitude each) sine-waves on FFT bins 32 and 48. Note too
the considerable discrepancy between the shapes of the upper
TPDF-dithered noise curve (which is the intended noise curve) and the lower noise curve, which is that actually delivered by
the undithered 1-bit sigma-delta modulator.