views:

723

answers:

16

I'd like to write a simple program(preferably in C#) to which I sing a pitch using a mic and the program identifies to which musical note that pitch corresponds.


Thank you very much for your prompt responses. I clarify:

I'd like a (preferably .NET) library that would identify the notes I sing. I'd like that such a library:

  1. Identifies a note when I sing(a note from the chromatic scale).
  2. Tells me how much I'm off from the closest note.

I intend to use such a library to sing one note a time.

A: 

I don't know any specific details, but you would measure the frequency and then map it to a pitch. For example, if it's 440hz, then it's an A. Also by measuring the frequency you could tell how far out of tune it is.

Chad
+1  A: 

Have you seen this question? Fast Fourier Transform in C#

It sounds like this might be able to help you.

Theresa
+7  A: 

You would usually do a Fourier transform on the input, then identify the most prominent frequency. This might not be the whole story though, since any nonsynthetic sound source produces a number of frequencies (they make up what is described as "tone colour"). Anyway, it can be done efficiently; there are real-time autotuners (you didn't believe that pop starlet could really sing, did you?).

Svante
That won't work if one of the harmonics is larger than the fundamental. This is more common than you might think. Trumpet spectrum: http://www.eng.cam.ac.uk/DesignOffice/mdp/electric_web/AC/02284.jpg
endolith
endolith, that's what I meant with "not the whole story".
Svante
+15  A: 

The crucial piece of this problem is the Fast Fourier Transform. This algorithm turns a waveform (your sung note) into a frequency distribution. Once you've computed the FFT you identify the fundamental frequency (usually the frequency with the highest amplitude in the FFT, but this depends somewhat on your microphone's frequency response curve and exactly what type of sound your mic is listening to).

Once you've found the fundamental frequency you need to lookup that frequency in a list that maps frequencies to notes. Here you'll need to deal with the in betweens (so if the fundamental frequency of your sung note is 452Hz what note does that actually respond to, A or A#?).

This guy on CodeProject has an example of FFT in C#. I'm sure there are others out there...

Jason Punyon
Not all pitch detection algorithms are based on frequency analysis.Some are based on time analysis (it's true however that even time analysis -eg: autocorrelation- frequently use FFT for performance reasons.) http://en.wikipedia.org/wiki/Pitch_detection_algorithm#Time-domain_approaches
leonbloy
+2  A: 

You'll want to capture your raw input, accumulate some samples, and then do an FFT on them. The FFT will convert your samples from time domain to frequency domain, so what it produces is a bit like a histogram of how much energy the signal contained at various frequencies.

Getting from that to "the" frequency may be a bit difficult though -- a human voice is not going to just contain a single, clean frequency of sound. Instead you'll normally have energy at a pretty fair number of different frequencies. What you'll typically do is start from about the lowest voice range, and work your way up, looking for the first (lowest) frequency at which the energy is significantly higher than the background noise.

Jerry Coffin
+3  A: 

Performing a Fourier transform will give you values for each frequency found in the sample. The more prominent the frequency, the higher the value. If you look for the largest value, you'll find your root frequency but overtones will also be present.

If you're looking for specific frequency, using the Goertzel algorithm can be very effective.

Carl
+2  A: 

You have to do an FFT of the sample and then analyze that. The two things that will complicate your analysis are:

  1. Overtones. If you sing/play the A at 440 Hz (A4), you will also get a tone at A5 (880Hz), one at E6 (1320 Hz), etc. Depending on the relative intensities at the frequencies, this tone could be perceived as an A4, A5 or E6, and detrimining the tone is not simply a matter of where the most intensity is, the human ear is more complicated than that. You could, however, guess reasonably well that it will be perceived as an A.

  2. Granularity. Your FFT will have a granularity that depends only on the duration of the sample, not on the sampling frequency. If I remember correctly, you need a two-second sample to be able to get a granularity of 1 Hz, which is still a little bit coarse. One way to get around this is to take three frequencies around each spike, approximate a second-degree polynomial around them, and then determine the maximum of that polynomial. I have read a paper claiming that using the phase is more accurate than the amplitude for this, but I don't remember where so I can't quote it.

erikkallen
"perceived as an A4, A5 or E6" The harmonic are not all octaves.
endolith
@endolith: What do you mean?
erikkallen
OK, now I see it. It was a typo. Fixed
erikkallen
A: 

I think you want this question: Real Time Pitch Detection Using FFT

Dean J
+3  A: 

Pretty much every answer says to do an FFT. I've written this program myself, and I found that the FFT was good at roughly identifying the strongest frequency, but that there was some "smearing" out as a result -- it's not always easy to precisely identify tiny variations from the target pitch using an FFT, particularly if the sample is short.

Erik Kallen's approach seems reasonable, but there are other approaches. What I found worked fairly well was using a combination of FFT and a simple "zero crossing" detection algorithm to narrow in upon the exact frequency of the signal.

That is, count the number of times the signal crosses the zero line in a given interval, fit that to the rough frequency "bucket" produced by the FFT, and you can get a quite precise result.

Eric Lippert
+1  A: 

I've done pitch detection in the past, and the simple solution of "take an FFT and look at the peak" doesn't work at all for speech. I had fairly good luck using cepstral analysis . A lot of useful papers can be found in Lawrence Rabiner's publications. I recommend starting with "A comparative performance study of several pitch detection algorithms".

Just as a warning, it probably took me around 30-40 hours of work to get to the point where I could send a wav file into my program and have it spit out a sane number. I was also more interested in the fundamental frequency of a speaker's voice. I'm sure dealing with music will add many more wrinkles.

Dan Hook
+1  A: 

If you just want the result - i,e, to use the software, there is a program called SingAndSee that does just this. It's about £25

kpollock
Although the ideal thing would be to have just the library that does the pitch detection, your suggestion is very close to what I want. Thank you.
Antoni
kpollock, I just got SingAndSee. It's just great. It's simple, useful, and straight to the point. Thank you once again.
Antoni
glad my suggestion was useful. I just wish it had an "output to midi" option....
kpollock
The only slight "gotcha" is that though the detection itself seems precise enough, if you use the 'piano' keys at the top, the resultant note is a trifle sharp - at least I find so when feeding the output from the speaker back into SingAndSee via a mike. I'd suggest using a real, tuned, instrument for reference pitches.
kpollock
+1  A: 

Since you're dealing with a monophonic source, most of your pitches detected with an FFT should be harmonically related, but you're not really guaranteed that the fundamental is the strongest pitch. For many instruments and some voice registers in fact, it probably won't be. It should be the lowest of the harmonically related (in integer multiples of the fundamental) pitches detected though.

swarfrat
+1  A: 

You're looking for a frequency estimation or pitch-detection algorithm. Most people suggest finding the maximum value of the FFT, but this is overly simplistic and doesn't work as well as you might think. If the fundamental is missing (a timpani, for instance), or one of the harmonics is larger than the fundamental (a trumpet, for instance), it won't detect the correct frequency. Trumpet spectrum:

Trumpet spectrum

Also, you're wasting processor cycles calculating the FFT if you're only looking for a specific frequency. You can use things like the Goertzel algorithm to find tones in a specific frequency band more efficiently.

You really need to find "the first significant frequency" or "the first frequency with strong harmonic components", which is more ambiguous than just finding the maximum.

Autocorrelation or the harmonic product spectrum is better at finding the true fundamental for real instruments, but if the instrument is inharmonic (most are), then the wave shape is changing over time, and I suspect it won't work as well if you try to measure more than a few cycles at a time, which decreases your accuracy.

endolith
A: 

To convert the Time Domain signal coming from the microphone then you will need either a Discrete Fourier Transform (DFT) or a Fast Fourier Transform (FFT). The FFT will work quicker but the code will be much more complex (a DFT can be done in 5-10 lines of code). Once this is complete you have to map the fundamental frequencies to notes, unfortunately there are several mapping schemes depending on which tuning system you are using. The most common of these is Equal Temperament. Frequencies here. The Wikipedia article on Equal Temprement also gives a background on Equal temperament.

When using any fourier mathematics you need to know about how frequencies are handled, and Ideally perform anti-aliasing filtering before the transform and also watch out for the frequency reflection when performing a transform. Due to Nyquists theorum you will need to sample the microphone content at least twice as quickly as the maximum frequency ie. for a max frequency of 10Hz you must sample at 20Hz.

Mekboy
A: 

I'm amazed by all the answers here suggesting the use of FFT, given that FFT isn't generally precise enough for pitch detection. It can be, but only with an impractically large FFT window. For example, in order to determine the fundamental with 1/100th of a semi-tone accuracy (which is about what you need for accurate pitch detection) when the fundamental is around concert A (440 Hz), you need an FFT window with 524,288 elements. 1024 is a much more typical FFT size - the computation time become progressively worse the larger the window.

I have to identify the fundamental pitch of WAV files in my software synthesizer (where a "miss" is immediately audible as an out-of-tune instrument) and I've found that autocorrelation does by far the best job. Basically, I iterate through each note in the 12-tone scale over an 8-octave range, compute the frequency and the wavelength of each note, and then perform an autocorrelation using that wavelength as the lag (an autocorrelation is where you measure the correlation between a set of data and the same set of data offset by some lag amount).

The note with the highest autocorrelation score is thus roughly the fundamental pitch. I then "hone in" on the true fundamental by iterating from one semi-tone down to one semi-tone up by 1/1000ths of a semi-tone, to find the local peak autocorrelation value. This method works very accurately, and more importantly it works for a wide variety of instrument files (strings, guitar, human voices etc.).

This process is extremely slow, however, especially for long WAV files, so it could not be used as is for a realtime application. However, if you used FFT to get a rough estimate of the fundamental, and then used autocorrelation to zero in on the true value (and you were content with being less accurate then 1/1000th of a semi-tone, which is absurdly over-accurate) you would have a method which was both relatively fast and extremely accurate.

MusiGenesis
A: 

D3D11 contains an FFT implementation

bobobobo