views:

93

answers:

1

Hi all,

I want to make a program that would record audio data using PortAudio (I have this part done) and then display the frequency information of that recorded audio (for now, I'd like to display the average frequency of each of the group of samples as they come in).

From some research I've done, I know that I need to do an FFT. So I googled for a library to do that, in C, and found FFTW.

However, now I am a little lost. What exactly am I supposed to do with the samples I recorded to extract some frequency information from them? What kind of FFT should I use (I assume I'd need a real data 1D?)?

And once I'd do the FFT, how do I get the frequency information from the data it gives me?

EDIT : I now found also the autocorrelation algorithm. Is it better? Simpler?

Thanks a lot in advance, and sorry, I have absolutely no experience if this. I hope it makes at least a little sense.

+3  A: 

To convert your audio samples to a power spectrum:

  • if your audio data is integer data then convert it to floating point
  • pick an FFT size (e.g. N=1024)
  • apply a window function to N samples of your data (e.g. Hanning)
  • use a real-to-commplex FFT of size N to generate frequency domain data
  • calculate the magnitude of your complex frequency domain data (magnitude = sqrt(re^2 + im^2))
  • optionally convert magnitude to a log scale (dB) (magnitude_dB = 20*log10(magnitude))
Paul R
portaudio allows to record data in float format (32 bits floating point). So I encourage the use of this suggestion.
Iulian Şerbănoiu
@Iulian: yes, that would probably be the sensible thing to do, assuming you have no other pre-processing that you want to do in the integer domain before you generate the power spectrum.
Paul R
Thanks, just one more, probably stupid question. I've computed everything, but now let's say I want to check if there is a 440Hz (middle A) frequency present in the signal. How do I do that?
houbysoft
If sample rate, Fs = 44.1 kHz, and FFT size, N = 1024, then the resolution of your spectrum will be Fs / N = 44100 / 1024 = 43.1 Hz. In other words, each "bin" of your power spectrum is around 43 Hz wide. A 440 Hz component will therefore show up mostly in bin 10. If you need more resolution then you will have to increase N.
Paul R
I see, thanks a lot.
houbysoft