*Before I dive in to Daubechies wavelets, a confession: at university, Fourier series were the bane of my existence. I could* **do** *them, under duress, but in the same way as I set up the audio for Wrong, But Useful1: I had a recipe of steps I needed to follow, and no understanding or desire to understand where the steps came from or why they worked. When wavelet analysis showed up during my PhD, my brain’s immune system went - and continues to go - "I* **don’t** *think so, thank you." So, if this article feels like it’s written more painstakingly than others in the series… that’s why.*

One of the key ideas in data processing is to take a signal - say, a snippet of your favourite song - and to translate it into numbers so it can be stored and/or transferred by computer.

One approach to the problem is to figure out how much of the signal resonates with sine and cosine waves of different frequencies - a *Fourier series* can represent any sensible signal, if you work out enough terms.

Unfortunately, “if you work out enough terms” is doing a lot of heavy lifting there. Because of something known as the *Gibbs phenomenon*, any sharp corners in the original signal produce enormous perturbations in the Fourier approximation as it tries to match it. And that’s No Good At All.

The other downside is that the Fourier series gives you the important frequencies for the entire sample - so (unless you split the series up into arbitrary-length subseries), you’ve no way of telling the difference between a signal that’s dropping in frequency and one that’s rising. There’s no time information.

With **wavelets**, there is.

A wavelet transformation takes a signal and gives information about its frequency, but localised in time. There are many ways of doing this: different wavelet schemes have different properties.

To explain the properties of the *Daubechies wavelet*, I’d better explain how it works first.

One iteration takes a signal and applies two weighted moving averages to it: one of these, $H$, computes a smoothed value of a few consecutive data points, the other, $G$, computes (roughly speaking) how far from the ideal smoothness the points are. If the data is smooth, this second average gives zero. In other words, $H$ gives smoothness information and $G$ gives detail information.

The process is then applied recursively to the detail information: the same weighted averaging process is applied to all of the $G$ results to give another batch of smoothness and detail information. This is the reason it’s helpful for the input to be a power of 2 in length: this means you can recurse all the way down!

The Daubechies wavelet has several nice properties:

- Given the smoothness parameters all the way down, you can recreate the original signal simply - the moving average coefficients are chosen so that the inverse calculation is very simple.
- A sufficiently smooth set of signal elements - in particular, one where the first $p$
*moments*2 vanish - gives a zero detail element.

Under these conditions, it’s possible to compute values for the required weights in the moving averages. For example, if $p=2$, we need four coefficients3: the smoothing average applied to four x-values is $c_0 x_{i} + c_1 x_{i+1} + c_2 x_{i+2} + c_3 x_{i+3}$, and the detail average is $c_3 x_{i} - c_2 x_{i+1} + c_1 x_{i+2} - c_0 x_{i+3}$.

To make the inverse calculation simple4, we need:

- $c_0^2 + c_1^2 + c_2^2 + c_3^2 = 1$
- $c_0c_2 + c_1c_3 = 0$

But, if we want data with zeros in the first two moments to give zero details, we also need:

- $c_3 - c_2 + c_1 - c_0 = 0$ and
- $-c_2 + 2c_1 - 3c_2 = 0$.

This gives four equations in four unknowns, which can be solved - in this case, analytically, although larger values of $p$ typically require numerical methods.

Ingrid Daubechies, born in Belgium in 1954, is a professor at Duke University in the USA. She is one of the world’s most cited mathematicians, largely as a result of her work on wavelets.

As a child, rather than count to get herself to sleep, she would calculate powers of 2. I don’t know why that makes me grin, but it’s a detail I can’t resist sharing.

- available wherever you get your podcasts [↩]
- The $n$th moment of a function is the expected value of $x^n$ - so the first moment is the mean, the second is related to the variance and so on. [↩]
- Generally, $p=k$ requires $2k$ coefficients. [↩]
- We want the encoding and decoding matrices to be orthonormal, so one is the transpose of the other - for details, I recommend Numerical Methods [↩]