views:

1790

answers:

5
+2  A: 

Mostly Java: http://cmusphinx.sourceforge.net/html/cmusphinx.php

Amit
After working with it, it's actually quite horrible. Barely recognizes anything, and it's not like I have a horrid accent or anything.Training it seems even more of a problem and unless you're willing to pocket out for some third party database your sitting with the bottom of the heap.
I haven't had any practical experience with it./
Amit
+1  A: 

I have been looking for the same thing for a few days now. So far I have found Sphinx4 and FreeTTS. Both are java implementations and Sphinx seems like it is updated rather frequently unlike FreeTTS. The only problem that I am having is that Sphinx is having problems understanding me in an office environment, and I need a solution for a warehouse environment.

+1  A: 

sphinx is by far the best option available if you are on a budget. however it also makes a huge difference what models you use, how you tune them and how you tune your audio source. absolutely everything has to match otherwise it just wont work. given the problem you described id be willing to bet a substantial sum that you've got you got your models mixed up and your mic is not correctly calibrated. also, if you have an accent it probably will not work - this is not an issue with the decoder but with the acoustic models - if no one with a voice/accent similar to yours was included in the training data you'll get poor results.

that said, have you looked at their open source models page?

http://www.speech.cs.cmu.edu/sphinx/models/

depending on what you are trying to do you should be able to obtain about 90% accuracy on free speech with the 16kHz WSJ models and the gigaword LMs NVP. i caution however that ASR is a massive undertaking and hasn't yet reached commodity status.

blackkettle
I think I came to that realization, it still has a long road to go.Wether I have an accent or not is subjective :D but likely.Ive recently stopped using ubuntu and jumped onto the windows bandwagon. When I continue with this, I think I will have the capability to use microsoft's engine, which has worked reasonbly in the past. But in the end... I think the technology has far to go, and I think I'll be dropping that part completely for 10 years :)
microsoft's engine also used to be based on sphinx. now i think they perhaps rely more heavily on HTK, another open source speech recognition system. your accent is not a subjective issue from the point of view of an ASR system. the results will be heavily dependent on how well the characteristics of your voice match those of the voices in the training data. differences which may seem trivial to you, for example a canadian versus an american accent, may have a very significant impact on the ASR quality. these days most systems rely on the same algorithms, the difference is the data.
blackkettle
A: 

Shpinx is the best. I dont remember where I get it but you can search it by Google. My group had finished a mini program in java to recognize digit.

Kiet Tran
A: 

Hi, you can download vPass (voice password) from http://www.basic-signalprocessing.com.

For (vText) voice to text, i can send the vText.jar file to your email. Pls notify [email protected]

The components are designed for Java and .Net language. The recognition period is 5 seconds. VPass is well tested vText is not, still new, that's why not packaged yet.

regards, Andreas

Andreas