views:

3500

answers:

5

I am planning to start an application which converts the speech to text in Linux. Are there any existing interfaces so that I can extend them? or Is there any such existing application in Linux? Any inputs on this?

EDIT: The application that I am planning to write should be able convert every word that we speak to text, not just the Yes/No.

+5  A: 

Well, this is quite an undertaking and without saying what technology you want to use, here are some links:

Good luck. With more detail, we may be able to provide better answers. For example, there's a big difference between "yes/no" call center-style recognition vs. even partial natural language understanding.

Dave Ray
+2  A: 

Dave's suggestions are a great start. Sphinx is very nifty.

I just want to add that you should be as probabilistic as possible. As a one-time linguist and even earlier one-time phonology buff, I can confidently say don't get caught up with linguistic models. Let's not forget the oft misattributed "every time I fire a linguist my accuracy goes up". It's really about the model and its capabilities to account for noise and variation rather than anything a liberal arts major from MIT has to say.

A good book to pick up would be Jurafsky and Martin's "Speech and Language Processing". It has some very useful applications of computational models for the task. Harvey Sussman's work on linear correlates in the F2 slopes for a variety of vowels (starting with barn owls and working its way towards humans) seems like it would be a nice thing to implement in a model one of these days.

Robert Elwell
+1  A: 

Sphinx is your best bet on linux. I have tried Sphinx II and Sphinx III. There are some open source language and acoustic models available which can be used with each one of them. Not a production level performance at all, but good enough for prototyping or demo. For production, you'll need to develop your own language and acoustic models.

braindead
A: 

Julius is also a good option for Linux

Latrokles
A: 

I have developed Speech to Text, and Text to Speech using Windows XP and Windows SAPI 5.X. It works very well. Don't know anything on Linux and I be interested to use it as well since I am developing a product and Want it to be based on Linux.

Ben