tags:

views:

377

answers:

3

Hello, I want to use either sphinx4 or the HTK toolkit to build me a speech recognition application that aims to estimate ones age from voice. I understand, to a greater extent, the ststistical models involved in speech recognition. I am interested in Mel frequency cepstral coefficients and Gausian mixture models because these two are better suited to my problem domain. Do I have to use neural networks and feed in the training data from the vectors derived from the sphinx classifiers ? I am not quite sure where to start with sphinx or the HTK toolkit. I am new to sphinx and speech recognition and my application is only a prototype.

Can anyone please offer some form of guidance in this regard. Kind regards.

+1  A: 

Usually, the first place to start for something like this is to look for prior related work from the academic community. In Minematsu et al. 2002, they used Gaussian mixture models (GMMs) over mel-frequency cepstral coefficients to distinguish between old and young speakers.

Presumably, if you have access to training data with both old and young speakers, you should be able to do the same. Even if you'd like to try another classifier back-end such as neural networks, it would probably be good to start with GMMs since you know that they should work for your task and they'll give you something to compare with whatever other classifiers you'd like to try to use.

If you're just doing this for fun or as a research project, I would recommend using HTK, since I like how modular it is. However, if this is being down for something commerical, you should probably go with Sphinx, since it can be redistributed under a BSD like license.

dmcer
Hi dmcer,Thank you for the pointers. It is for a research project and I am trying to build a prototype. Hence I am looking at the HTK but that will require that I learn C programming. This is why I was looking at sphnix4 because its written in java. Being a novice in speech apps. I know the the probability concepts(conditional probability and Bayes rule and the various distributions) but I don't know how to to sart with HTK or sphinx and the tools that they offer. Can you point me to any resource tha can explain how the tools are used?I downloaded the HTK but its confusing me quite a bit.
Binaryrespawn
@Binaryrespawn - if you want to use Sphinx, have you taken a look at the demo code it's packaged with, http://cmusphinx.sourceforge.net/sphinx4/#demos ?
dmcer
In did improt the source files and build them using ant. I am trying to run the demos now, however, I think the mic on the laptop is not reaching the demos, thus when prompted to speak, the demo cannot receive my speech. Any suggestions in getting a mic thalkin to sphinx4 and by extention the demos. I am using Windows XP on a Dell M6400 laptop with mic and cam.
Binaryrespawn
I'm not sure why Java isn't able to use the mic on your system. You could submit another question to stackoverflow just about the Java microphone issue.
dmcer
Ok i got that sorted out, thank god....I was able to run the demos and have a little interactive session with sphinx. Now its time to actually do some work with sphinx4. Thank you so much for you assistance thus far.
Binaryrespawn
A: 

Hi Binaryrespawn. I am currently looking into accent recognition using MFCC and was thinking of using Sphinx 4. I was just wondering were you successful with using the MFCC for age classifications and have you any pointers on how to go about this research.

Mike
A: 

Hi Mike, I decided not to go with Sphinx 4 because its based on Hidden Markov models which is primarily used for sequencial analysis auch as speech recognition and even multimodal inputs to an interface based on the input sequence. Insted I went with a software called Praat, its for speech processing and synthesis. There is also a "plugin" if you like, called "Akustyk" which is used to analyse vowels and so on. May be that direction will be of value for you, i'm not sure.

You can then use mathlab and use the pattern recognition toolbox to implement your neural networks, GMM, or whatever approach you wish to pursue.

Hope it was helpful.

Binaryrespawn