An Audiophile Mini-Guide to Digital Audio Clocks

Sometimes dealing with technology can be confusing.

In a world that requires a level of computational understanding for everyday communication that just a generation ago was rare, those who have chosen the convenience and sonic quality that digital audio for music playback offers can find themselves at odds with not only making it convenient, but even understanding exactly how it does what it’s doing.

The idea that jumping into computer-based audio is one of ease and that it makes CDs and LPs seems archaic in their physical attributes and constant need for attention is one that many audiophiles and music lovers have embraced with hope, only to be rebuffed by the inherent complexity of the medium.

Let’s be honest: It ain’t all easy. And the Internet and Google can be as confusing as they are helpful in getting the various streamers, servers, DACs, laptops or PCs, cables, routers and power supplies all in a row.

Once it’s all set up and running smoothly and one is casually browsing through software easily accessing millions of online albums or thousands of titles on local drives you have ripped from CD or downloaded from high-res sites, well, that’s the dream. But, the reality – to get to that point where everything is communicating properly with each other on the network – can be a whole other story.

Having a digital-audio front end running glitch-free alone is a huge accomplishment for most people diving in, and once acheived, should be acknowledged with a pat on the back. But, then there’s the whole part of understanding what is you’ve done and how it’s working in harmony to deliver you that sonic bliss you crave. How much do you need – or even want – to know?

There’s technology like ‘bit perfect,’ ‘asynchronous USB,’ ‘DoP,’ ‘aliasing,’ ‘bandwidth,’ ‘quantization,’ ‘PCM,’ ‘DSD,’ and of course ‘jitter’ and ‘word clock’ just to name a few. In this post it’s mostly the last two mentioned that we’re interested in discussing.

There’s a lot of differing opinions between DAC manufacturers about digital audio clocks, the types available for use and about the methods of their circuit implementation. Jitter and drift are two aspects of timing and playback of digitized analog sine waves in the digital-audio data stream that get a lot of ink, but what causes them to occur? Are we just getting caught up in the minutiae of technology by exploring it, or is it so important it’s worthy of discussion? I’d suggest the latter, as I’m a firm believer that God is in the details and if you know anything about the way our brains learned to translate audible-sensory information in the many tens of thousands of years during our rise from being cowering prey on the savannah to driving an air-conditioned Mercedes-Benz, then you’re aware that our genetic software’s ability to discern spatial and timing cues from the sounds going on around us are paramount and have become ever more finely tuned over time. Clocks control timing and timing in digital audio, as they say, is everything.

So, in order to discuss this with more expertise, I turned to John Quick of dCS (data Control Systems) out of the UK. dCS is a company with decades of cutting-edge experience in digital audio, and it made sense ask a company focused on digital audio playback to answer some questions.


RA: Why is a clock important?

JQ: Everything in digital audio has to have a clock – be it your cell phone, your server/renderer, or DAC – because digital data doesn’t move and cannot be processed without one.

RA: There’s talk in this hobby about technology which can result in incorrect information being passed along. When it comes to digital audio and DACs in particular, audiophiles can get wound up about the importance of the clock and its involvement in minimizing jitter or distortion (in the short term) and drift (over longer periods). What is jitter and distortion within this context and just how important is the digital sampling of an analog signal with precision and regularity when it comes to the accuracy of the waveform being reconstructed for playback?  

John Quick: In general terms, it’s actually far easier to build a decent-sounding DAC than it is to build, say, a turntable, because of the documentation provided by DAC chip manufacturers with regard to what the “supporting cast” of other chipsets are needed and what the PSU requirements are. That said, many DACs have come to market from companies who have no real idea what they are doing, so it’s no surprise misinformation runs rampant.

To get to the crux of your question: jitter is pervasive in every stage of digital capture and playback. By definition, jitter is a deviation from the reference sampling interval and its effects cause misrepresentations of the phase and amplitude of the analog signal the system is meant to represent. These misrepresentations in turn cause various kinds of distortion. Accuracy in capturing and reconstructing the analog waveform is therefore very closely reliant on accuracy of the timing reference and the need for this accuracy becomes even more crucial at higher sampling frequencies.

That said, the fact that some types of clocks are specified for extremely low average jitter figures over extremely long periods of time (years) is not all that relevant if it is not stable during a five-minute track. The fact is, all clocks have some degree of drift over time, and it’s our belief at dCS that the most important thing for audio is to have extremely good short-term stability versus long-term accuracy.  

RA: Does clock jitter and drift in a digital-playback system affect the system’s ability to maintain bit-perfect playback? Does ‘bit-perfect’ mean anything other than the integrity of copying or moving a file for playback?  Is it unimportant when it comes to discussion of clocks because timing samples in this context don’t matter, only the value of each bit of information being accurately recovered matters?  

JQ: Being bit-perfect literally means maintaining the integrity of digital data as it’s transferred from one place to another. The answer is: Generally, yes, because the right sample at the wrong time is the wrong sample.

RA: It’s my understanding that a ‘word clock’ is a relatively low-frequency signal that is used to describe “one cycle per sample period of a square wave signal used for synchronization of digital audio equipment.” One ‘word’ was traditionally made up of 16 bits of information and is used to synchronize the timing of the data transmission containing those bits. What happens as the ‘word’ length increases to 32 or 64 bits, especially in the context of oversampling?

JQ: The data (words) and the timing reference (clock frequency) work in harmony such that in all circumstances (save for Ethernet, which has a different set of rules) the data is moved and processed in time with the clock reference; data doesn’t move without a timing reference and the accuracy of the timing reference largely determines whether the captured or reconstructed/played data is what it should be. The bit depth (word length) multiplied by the sample frequency provides the number of levels a given sample can represent; so, the higher the bit depth and the higher the sample frequency, the more accurately that sample can be placed when reconstructing the analog wave provided the word clock is exact. Higher bit depths and higher sample frequencies provide the possibility of more accurate analog waveform reconstruction, but higher sample frequencies especially put more onus on an accurate and stable clock to not create and compound errors.

RA: The (literal) beating heart which determines the timing of digital-audio clocks are specialized quartz crystals – some are even aged to increase stability. These are used as controlled electronic oscillators, some are even adjusted to perform best at a constant temperature like the ‘Oven Controlled Clock Oscillator’ (OCXO). Some think that an OCXO is a magic bullet of sorts when it comes to choosing the method for ensuring oscillation calibration. Is it? Or is it more about choosing another method – which might not involve the pursuit of a temperature constant – in the long run?

JQ: Actually, I believe an OCXO is a type of crystal oscillator that is calibrated for one temperature, therefore it is not a magic bullet because even small temperature changes to the crystal’s environment will affect its accuracy. We generally use VCXO (voltage controlled crystal oscillators) crystals that can be changed on-the-fly by varying the voltage to ensure stable operation. Our VCXOs are calibrated with software to ensure they operate at exactly the right frequency.

RA: Thank you John Quick.

Dick James's picture

...that is overnized to maintain one temperature, usually 70°C. Without an oven, the VCXO is subjected to the same temperature variations as the rest of the DAC. Most crystals used in VCXOs are what's called an AT cut, which means its frequency variation versus temperature variation looks like a cosine. There are other cuts that are best for an OXCO because the frequency variation is very flat around a specified temperature, but the variation is quite large outside of that specified temperature. Ovens are always better for stability, even if an "oven" is synthesized with DSP to control the voltage for a VCXO so that the VCXO has the stability of an OXCO.

Ali's picture

Why not using a Rolex?