views:

426

answers:

3

Let us say that I have a WAV file. In this file, is a series of sine tones at precise 1 second intervals. I want to use the FFTW library to extract these tones in sequence. Is this particularly hard to do? How would I go about this?

Also, what is the best way to write tones of this kind into a WAV file? I assume I would only need a simple audio library for the output.

My language of choice is C

+1  A: 

WAV files contain linear pulse code modulated (LPCM) data. That just means that it is a sequence of amplitude values at a fixed sample rate. A RIFF header is contained at the beginning of the file to convey information like sampling rate and bits per sample (e.g. 8 kHz signed 16-bit).

The format is very simple and you could easily roll your own. However, there are several libraries available to speed the process such as libsndfile. Simple Direct-media Layer (SDL)/SDL_mixer and PortAudio are two nice libraries for playback.

As for feeding the data into FFTW, you would need to buffer 1 second chunks (determine size by the sample rate and bits per sample). Then convert all of the samples to IEEE floating-point (i.e. float or double depending on the FFTW configuration--libsndfile can do this for you). Next create another array to hold the frequency domain output. Finally, create and execute an FFTW plan by passing both buffers to fftw_plan_dft_r2c_1d and calling fftw_execute with the returned fftw_plan handle.

Judge Maygarden
Not actually the `fftw` version, but whether or not it was compiled with float support, no?
Stephen Canon
True, it is a matter of the build configuration IIRC. I haven't used FFTW in many years. Perhaps "version" is not the most accurate word I could have chose?
Judge Maygarden
Much of the audio DSP software for Linux (and other platforms) which uses FFTW requires FFTW built with float support, and having spent much time building this stuff from source, I can say that Debian at least, has packages for the various different build options of FFTW which can all be installed simultaneously. I expect this goes for most other Linux distros too.
James Morris
libsndfile will take care of converting your WAV files to floating point format, automatically, in general it's really quite a breeze to use.
James Morris
+5  A: 

To get the power spectrum of a section of your file:

  • collect N samples, where N is a power of 2 - if your sample rate is 44.1 kHz for example and you want to sample approx every second then go for say N = 32768 samples.

  • apply a suitable window function to the samples, e.g. Hanning

  • pass the windowed samples to an FFT routine - ideally you want a real-to-complex FFT but if all you have a is complex-to-complex FFT then pass 0 for all the imaginary input parts

  • calculate the squared magnitude of your FFT output bins (re * re + im * im)

  • (optional) calculate 10 * log10 of each magnitude squared output bin to get a magnitude value in dB

Now that you have your power spectrum you just need to identify the peak(s), which should be pretty straightforward if you have a reasonable S/N ratio. Note that frequency resolution improves with larger N. For the above example of 44.1 kHz sample rate and N = 32768 the frequency resolution of each bin is 44100 / 32768 = 1.35 Hz.

Paul R
+1  A: 

You are basically interested in estimating a Spectrum -assuming you've already gone past the stage of reading the WAV and converting it into a discrete time signal.

Among the various methods, the most basic is the Periodogram, which amounts to taking a windowed Discrete Fourier Transform (with a FFT) and keeping its squared magnitude. This correspond to Paul's answer. You need a window which spans over several periods of the lowest frequency you want to detect. Example: if your sinusoids can be as low as 10 Hz (period = 100ms), you should take a window of 200ms o 300ms or so (or more). However, the periodogram has some disadvantages, though it's simple to compute and it's more than enough if high precision is not required:

The raw periodogram is not a good spectral estimate because of spectral bias and the fact that the variance at a given frequency does not decrease as the number of samples used in the computation increases.

The periodogram can perform better by averaging several windows, with a judious choosing of the widths (Bartlet method). And there are many other methods for estimating the spectrum (AR modelling).

Actually, you are not exactly interested in estimating a full spectrum, but only the location of a single frequency. This can be done seeking a peak of an estimated spectrum (done as explained), but also by more specific and powerful (and complicated) methods (Pisarenko, MUSIC algorithm). They would probably be overkill in your case.

leonbloy