views:

87

answers:

4

I want to detect not the pitch, but the pitch class of a sung note.

So, whether it is C4 or C5 is not important: they must both be detected as C.

Imagine the 12 semitones arranged on a clock face, with the needle pointing to the pitch class. That's what I'm after! ideally I would like to be able to tell whether the sung note is spot-on or slightly off.

This is not a duplicate of previously asked questions, as it introduces the constraints that:

  1. the sound source is a single human voice, hopefully with negligible background interference (although I may need to deal with this)

  2. the octave is not important, only the pitch class

I am contemplating first smoothing the microphone input signal, something like

ySmoothedNew = ySmoothedLast * 0.9 + newY * 0.1; ySmoothedLast = ySmoothedNew;

then calculating zeros. of course I expect each wave to comprise several crossings, but provided each wave contains the same number of crossings, it shouldn't be that hard to figure out the periodicity.

But I feel sure I'm reinventing the wheel. Before I get sunk in a mass of floats, can anyone help steer me in a sensible direction?

PS I will be very grateful if anyone can point me to some simple iPhone wrapper code that exposes the microphone byte stream.

A: 

Perform a Discrete Fourier Transform on samples from your input waveform, then sum values that correspond to equivalent notes in different octaves. Take the largest value as the dominant frequency.

You can likely find some existing DFT code in Objective C that suits your needs.

Nick Johnson
This doesn't work for human voices, especially male voices, as many frequency values more likely belong to the overtone of a completely different "note", than it does to the obvious choice from a frequency-to-note table.
hotpaw2
A: 

Putting up information as I find it...

Pitch detection algorithm on Wikipedia is a good place to start. It lists a few methods that fail for determining octave, which is okay for my purpose.

A good explanation of autocorrelation can be found here (why can't Wikipedia put things simply like that??).

Ohmu
+3  A: 

Pitch is a human psycho-perceptual phenomena. Peak frequency content is not the same as either pitch or pitch class. FFT and DFT methods will not directly provide pitch, only frequency. Neither will zero crossing measurements work well for human voice sources. Try AMDF, ASDF, autocorrelation or cepstral methods. There are also plenty of academic papers on the subject of pitch estimation.

There is another long list of pitch estimation algorithms here.

Edited addition: Apple's SpeakHere and aurioTouch sample apps (available from their iOS dev center) contain example source code for getting PCM sample blocks from the iPhone's mic.

hotpaw2