tags:

views:

131

answers:

1

I need to be able to determine if two sounds are very similar. The goal is to have a very limited vocabulary (10 or 15) of short one or two syllable words, then compare a captured sound to determine if it is one of those items with all the usual variability in environmental and capture conditions. The idea is that the user can issue a few simple commands by voice instead of keyboard or mouse.

Does anyone know the best approach to this? I don't want to do full blown speech recognition, just something much more limited.

+2  A: 

I'd reconsider using a speech recognition library... like CMU's Sphinx software or Microsoft's speech recognizer. Unfortunately, its not a simple task to do this on your own. One approach that is somewhat typical of how to accomplish what you are trying to do is as follows:

1) Chop the sample into small segments (a few miliseconds)

2) Fourier transform on each segment, collect the principal coefficients

3) use a Hidden Markov Model to figure out the likely transition of phonemes given your sequence of coefficients

4) map to a dictionary depicting phonemes to words (you could look at the Sphinx dictionary as a guide)... a small set like yours should produce excellent results.


If you wanted to simplify this somewhat, you might try taking the coefficients at specific timesteps and feeding them into an SVM or neural network... I haven't tried this yet but bet you could get reasonable results with some tuning.

tbischel