views:

2353

answers:

7

Sample held in a buffer from DirectX. It's a sample of a note played and captured from an instrument. How do I analyse the frequency of the sample (like a guitar tuner does). I believe FFT's are involved, but I have no pointers to HOWTO's.

+3  A: 

FFTs (Fast-Fourier Transforms) would indeed be involved. FFTs allow you to approximate any analog signal with a sum of simple sine waves of fixed frequencies and varying amplitudes. What you'll essentially be doing is taking a sample and decomposing it into amplitude->frequency pairs, and then taking the frequency that corresponds to the highest amplitude.

Hopefully another SO reader can fill the gaps I'm leaving between the theory and the code!

Daniel Papasian
This approach has serious accuracy problems, especially in the music context. As endolith points out, FFT gives you intensity within a range of frequencies; the range is larger the smaller (and faster) the FFT window size. Even worse, the overall range is 0 to 44100 Hz (for redbook audio), while a typical musical note is almost always well below 1000Hz, so most of the resolution you have is wasted on the higher frequency bands.
MusiGenesis
A: 

Apply a DFT and then derive the fundamental frequency from the results. Googling around for DFT information will give you the information you need -- I'd link you to some, but they differ greatly in expectations of math knowledge.

Good luck.

Cody Brocious
+3  A: 

Guitar tuners don't use FFT's or DFT's. Usually they just count zero crossings. You might not get the fundamental frequency because some waveforms have more zero crossings than others but you can usually get a multiple of the fundamental frequency that way. That's enough to get the note although you might be one or more octaves off.

Low pass filtering before counting zero crossings can usually get rid of the excess zero crossings. Tuning the low pass filter requires some knowlegde of the range of frequency you want to detect though

Mendelt
I doubt they just count zero crossings. There are many zero crossings per cycle in a typical guitar waveform. http://www.flickr.com/photos/56868697@N00/4180888094/ They probably do a simple autocorrelation.
endolith
More zero crossings doesn't really matter for a simple tuner. Remember that a tuner doesnt need the exact frequency of the fundamental. It needs to know the note. By counting more zero crossings per cycle it might lock on to a higher octave but a Cb will still be a Cb and two cents too high will still be two cents too high. Autocorrelation is great for more advanced processing but it's overkill for a tuner.
Mendelt
+2  A: 

A little more specifically:

If you start with the raw PCM in an input array, what you basically have is a graph of wave amplitude vs time.Doing a FFT will transform that to a frequency histogram for frequencies from 0 to 1/2 the input sampling rate. The value of each entry in the result array will be the 'strength' of the corresponding sub-frequency.

So to find the root frequency given an input array of size N sampled at S samples/second:

FFT(N, input, output);
max = max_i = 0;
for(i=0;i<N;i++)
  if (output[i]>max) max_i = i;
root = S/2.0 * max_i/N ;
AShelly
+5  A: 

There are also other algorithms that are time-based, not frequency based. Autocorrelation is a relatively simple algorithm for pitch detection. Reference: http://cnx.org/content/m11714/latest/

I have written c# implementations of autocorrelation and other algorithms that are readable. Check out http://code.google.com/p/yaalp/.

http://code.google.com/p/yaalp/source/browse/#svn/trunk/csaudio/WaveAudio/WaveAudio Lists the files, and PitchDetection.cs is the one you want.

(The project is GPL; so understand the terms if you use the code).

I want to vote this answer up several times. FFTs are a terrible solution to this problem; it's like driving a tank to the grocery store. Sure, it's cool, but it's really not the best way. Autocorrelation is generally considered THE RIGHT solution to this problem; viz Autotune/Melodyne/Singstar/you name it. Zero crossing based solutions are ONLY APPLICABLE when you have SPECIFIC knowledge of the hermonic behaviour. Autocorrelation can be implemented very efficiently too.
Dave Gamble
Whoa, what? I disagree. A frequency-domain solution is absolutely not a terrible solution, at all. Harmonic product spectrum or cepstral methods are easy to implement and reasonably robust. Autocorrelation is not *generally* considered the RIGHT solution by any means; it is one of several valid solutions. True, though: zero crossing is not a reliable indicator of pitch.
Steve
Autocorrelation is usually more computationally intensive than FFTs. We often use FFTs to do autocorrelations, in fact, because it's faster. Using naive autocorrelation when you could do it with FFTs is like driving a tank through the wall of the grocery store instead of using the front door.
endolith
+3  A: 

The FFT can help you figure out where the frequency is, but it can't tell you exactly what the frequency is. Each point in the FFT is a "bin" of frequencies, so if there's a peak in your FFT, all you know is that the frequency you want is somewhere within that bin, or range of frequencies.

If you want it really accurate, you need a long FFT with a high resolution and lots of bins (= lots of memory and lots of computation). You can also guess the true peak from a low-resolution FFT using quadratic interpolation on the log-scaled spectrum, which works surprisingly well.

If computational cost is most important, you can try to get the signal into a form in which you can count zero crossings, and then the more you count, the more accurate your measurement.

None of these will work if the fundamental is missing, though. :)

I've outlined a few different algorithms here, and the interpolated FFT is usually the most accurate (though this only works when the fundamental is the strongest harmonic - otherwise you need to be smarter about finding it), with zero-crossings a close second (though this only works for waveforms with one crossing per cycle). Neither of these conditions is typical.

Keep in mind that the partials above the fundamental frequency are not perfect harmonics in many instruments, like piano or guitar. Each partial is actually a little bit out of tune, or inharmonic. So the higher-frequency peaks in the FFT will not be exactly on the integer multiples of the fundamental, and the wave shape will change slightly from one cycle to the next, which throws off autocorrelation.

To get a really accurate frequency reading, I'd say to use the autocorrelation to guess the fundamental, then find the true peak using quadratic interpolation. (You can do the autocorrelation in the frequency domain to save CPU cycles.) There are a lot of gotchas, and the right method to use really depends on your application.

endolith
+1  A: 

Retrieval of fundamental frequencies in a PCM audio signal is a difficult task, and there would be a lot to talk about it...

Anyway, usually time-based method are not suitable for polyphonic signals, because a complex wave given by the sum of different harmonic components due to multiple fundamental frequencies has a zero-crossing rate which depends only from the lowest frequency component... Also in the frequency domain the FFT is not the most suitable method, since frequency spacing between notes follow an exponential scale, not linear. This means that a constant frequency resolution, used in the FFT method, may be insufficient to resolve lower frequency notes if the size of the analysis window in the time domain is not large enough.

A more suitable method would be a constant-Q transform, which is DFT applied after a process of low-pass filtering and decimation by 2 (i.e. halving each step the sampling frequency) of the signal, in order to obtain different subbands with different frequency resolution. In this way the calculation of DFT is optimized. The trouble is that also time resolution is variable, and increases for the lower subbands...

Finally, if we are trying to estimate the fundamental frequency of a single note, FFT/DFT methods are ok. Things change for a polyphonic context, in which partials of different sounds overlap and sum/cancel their amplitude depending from their phase difference, and so a single spectral peak could belong to different harmonic contents (belonging to different notes). Correlation in this case don't give good results...