views:

703

answers:

2

I'm trying to detect voice throught input from the microphone in real-time.

I allready receive the input, execute FFT algorithm and have the result in dB. I have a frequency domain, a time domain and a spectogram.

How can I get the fundamental frequency? If I get the fundamental frequency can I specify that if the frequency is between certain values, then it is voice that we are talking? Is there any other way to do this with the things that I allready have?

Tks in advance

+1  A: 

Take the highest peak on the spectrogram that's within the range for voice (say, 400 to 10K hz). That should give you the fundamental frequency.

Alternatively, you may need to integrate a histogram of frequencies. This is because sometimes you have words that start with or contain sibilants ("s" sounds) and fricatives ("f" and "th" sounds) which have fairly high frequencies and broad spectrum. You don't want to miss the start of speech because it started with something other than a vowel.

Another factor is what else would you pick up besides voice. Is there a lot of background noise? What kind? If there isn't any, then just the presence of sound is enough. If, for example, there's music, then you have a whole different challenge. If you're trying to distinguish between voice and some other sounds, then I'd be tempted to try a neural network approach--it's likely to need that level of complexity.

Frank Ames
It is a common misconception, but the fundamental frequency is not necessarily the largest peak in the spectrum. In voice, it depends how the harmonics line-up with the formants, and can change around depending on the frequency and quality of the sound.
tom10
+3  A: 

There are many different algorithms for frequency estimation, and the right one to use depends on what you're doing. What kinds of input do you expect? What do you want to do with that input? What kind of processing power do you have?

Detecting the fundamental frequency isn't going to help you identify whether a specific person is talking, if that's what you're trying to do. The frequency of your voice changes constantly. You'd have to make a "fingerprint" of the person's formants, etc.

Simply finding the peak of the FFT isn't going to give you good results for voice. Look into cepstral analysis.

endolith