views:

694

answers:

3

The Fast Fourier Transform takes O(N log N) operations, while the Fast Wavelet Transform takes O(N). But what, specifically, does the FWT compute?

Although they are often compared, it seems like the FFT and FWT are apples and oranges. As I understand it, a more appropriate comparison would be the STFT (FFTs of small chunks over time) and the complex Morlet FWT, since they're both time-frequency representations based on complex sinusoids (please correct me if I'm wrong). This is often shown with a diagram like this:

Grids showing how the coefficients of the FFT and WT correspond to the time-frequency plane

(Another example at the bottom here)

The bottom left shows how the STFT is a bunch of FFTs stacked next to each other as time passes (this representation is called a spectrogram), while the bottom right shows the dyadic WT, which has better time resolution at high frequencies and better frequency resolution at low frequencies (this representation is called a scalogram). In this example, N for the STFT is the number of horizontal rows, and a single O(N log N) FFT operation calculates a single column of N coefficients from N samples.

What I don't understand: How many coefficients does a single O(N) FWT operation compute, and where are they located on the time-frequency chart above? Which rectangles get filled in by a single computation?

If we calculate an equal-area block of time-frequency coefficients using both, is the FWT still more efficient than the FFT?

Concrete example using PyWavelets:

In [2]: dwt([1, 0, 0, 0, 0, 0, 0, 0], 'haar')
Out[2]:
(array([ 0.70710678,  0.        ,  0.        ,  0.        ]),
 array([ 0.70710678,  0.        ,  0.        ,  0.        ]))

It creates two sets of 4 coefficients, so it's the same as the number of samples in the original signal. But what's the relationship between these 8 coefficients and the tiles in the diagram?

A: 

Your reference has it:

A sequence of coefficients based on an orthogonal basis of small finite waves, or wavelets.

For more, you may like the DWT page. There it introduces Haar wavelets, Daubechies wavelets and others. It points out how

  • Wavelets have location – the (1,1,–1,–1) wavelet corresponds to “left side” versus “right side”, while the last two wavelets have support on the left side or the right side, and one is a translation of the other.
  • Sinusoidal waves do not have location – they spread across the whole space – but do have phase – the second and third waves are translations of each other, corresponding to being 90° out of phase, like cosine and sine, of which these are discrete versions.

If, instead of discrete wavelets, you want to now about continuous wavelets or complex wavelets, you might start with wavelet series.

Beyond wikipedia, a textbook and a course might do you well.

Ewan Todd
I don't understand this answer. Does it answer my questions? Left side and right side of what? What does this have to do with the time-frequency representation?
endolith
The "left side versus right side" description is an excerpted preview of the DWT page, showing that that page includes a simple example to explain the relative merits of the sinusoidal basis and the Haar wavelets basis. You were asking about the nature of the coefficients in a wavelet transform. It sounded like you were looking for intuition. I thought you might find that example (in its original context) useful.
Ewan Todd
Yes, I've read the Wikipedia articles multiple times before posting this question. I don't know if/what your answer has to do with my question about the time-frequency representation. If it does, could you connect the dots? An FFT of n samples will produce n coefficients, which make up a single column of the STFT spectrogram. Is there a corresponding relationship between the coefficients produced by the WT and the scalogram? If so, what is it? Which of the boxes in the bottom right chart are filled in by a single run through the FWT?
endolith
Almost everything on the Wikipedia pages related to wavelets is currently wrong.
Jon Harrop
+2  A: 

You are correct that the FWT is better thought of as a "cousin" of the STFT, rather than the FT. In fact, the FWT is just a discrete sampling of the CWT (continuous wavelet transform), as the FFT/DFT is a discrete sampling of the Fourier transform. This may seem like a subtle point, but it is relevant when choosing how you discretize the transform.

The CWT and STFT are both redundant analyses of a signal. In other words, you have more "coefficients" (in the discrete case) than you need to fully represent a signal. However, a Fourier transform (or say a wavelet transform using only one scale) integrate a signal from -infinity to +infinity. This is not very useful on real world signals, so we truncate (i.e. window) the transforms to shorter lengths. Windowing of a signal changes the transform -- you multiply by the window in time/space, so in transform space you have the convolution of the transform of the window with the transform of the signal.

In the case of the STFT, the windows are (usually) the same length (non-zero extent) at all time, and are frequency agnostic (you window a 10 Hz signal the same width as a 10 kHz signal). So you get the rectangular grid spectrogram like you have drawn.

The CWT has this windowing built in by the fact that the wavelets get shorter (in time or space) as the scale decreases (like higher frequency). Thus for higher frequencies, the effective window is shorter in duration, and you end up with a scaleogram that looks like what you have drawn for the FWT.

How you discretize the CWT is somewhat up to you, though I think there are minimum samplings in both shift and scale to fully represent a signal. Typically (at least how I've used them), for lowest scale (highest frequency), you will sample at all shift locations (time/space). As you get higher in scale (lower in frequency), you can sample less often. The rationale is that low frequencies don't change that rapidly (think of a cymbal crash vs. a bass guitar -- the cymbal crash has very short transients, whereas the bass guitar would take longer to change). In fact, at the shortest scale (assuming you sample at all shift locations), you have the full representation of a signal (you can reconstruct it using only the coefficients at this scale). I'm not so sure about the rationale of sampling the scale. I've seen this suggested as logarithmic, with (I think) closer spacing between shorter scales. I think this is because the wavelets at longer scales have a broader Fourier transform (therefore they "pick up" more frequencies).

I admit I do not fully understand the FWT. My hunch is that it is actually the minimum sampling in shift/scale, and is not a redundant representation. But then I think you lose the ability to analyze (and mess with) a signal in short time without introducing unwanted artifacts. I will read more about it and, if I learn anything useful, report back. Hopefully others will like to comment.

Patrick
"it is actually the minimum sampling in shift/scale, and is not a redundant representation."Ah! I think you're right, and this would explain why it's always compared to the FFT, which is also a minimal representation.
endolith
The FWT is a critical sampling of the CWT. I'm still trying to understand it better, but I've learned that the STFT and CWT are both Frames. Frame theory is getting beyond me, but one interesting notion is the uncertainty formula, that for the STFT, dw * dt > C (dw is the frequency resolution, and dt is the time resolution). In other words, as you try to better resolve frequency, you lose time resolution. The CWT does not have this limitation. I will keep reading and try and clarify my answer above once I clarify it in my head.
Patrick
From what I understand, CWT has the same limitation, but uses a better trade-off.
endolith
+1  A: 

Consider the Haar wavelet case. The Fast Wavelet Transform recursively subdivides your signal and computes the sum and difference of the two halves each time. The difference is the magnitude of the transform for the current wavelet and the sum is returned for the caller to compute the magnitude of the transform for a dilated wavelet with half the frequency. Thus, the FWT covers the time-frequency plane using the pattern described in the diagram you gave.

Note that the diagram you gave is a bit misleading. What they are really trying to tell you is that you get one sample at the lowest frequency, two samples at double that frequency, four samples at quadruple that frequency and so on. The time-frequency properties of each wavelet are not such that they cover their tile. In practice, each wavelet will cover an infinite area because they have compact support and, therefore, must be completely delocalized in terms of frequency. So you should just think about the centers of those tiles.

Furthermore, the FWT requires a discrete wavelet that must adhere to a far more restrictive admissibility criterion than continuous wavelets for the CWT. Consequently, the time-frequency properties of discrete wavelets are generally awful (e.g. the Daubechies wavelets are either full of sharp features or have changing frequency) and the utility of the time-frequency plane is greatly diminished in the context of the FWT. However, continuous wavelets are used to compute time-frequency representations of signals.

Jon Harrop
Yes, I understand the localization of the coefficients. That's the same as the FFT. When you say "must adhere", what do you mean? Is it only a requirement if you're trying to get a minimal/non-redundant representation of the signal? What if you're just trying to analyze/visualize it? I'll add a more concrete example to the question.
endolith
Adhering to the admissibility criterion ensures that a resolution of the identity exists, i.e. that all signals can be recovered from their wavelet transforms. If you do not adhere to it then you cannot recover a signal from its transform, at which point you must question what exactly it is that you're analysing (does it even reflect any information that was in the signal?!). If you do not require a minimal/non-redundant representation then you could use the more lax admissibility criterion from the CWT (which lets you define more "ideal" wavelets).
Jon Harrop
I think you would find my PhD thesis really useful. I'll put it on-line for you...
Jon Harrop
Did you put it online? :)
endolith
I sure did: http://flyingfrogblog.blogspot.com/2010/02/structural-properties-of-amorphous.html
Jon Harrop