views:

693

answers:

6

I want to know about various techniques to do speech recognition and text to speech conversion. Also please let me know about any resources like links, tutorials ,ebooks etc. on it.

Which is the most efficient technique to achieve it ?

+2  A: 

Since you mentioned MS -

You should just look at the Microsoft Speech site. It contains many resources for dealing with speech, including TTS and speech recognition.

Reed Copsey
+5  A: 

I'm going to answer the part about speech recognition (since I don't know much about text-to-speech):

This book, "Statistical Methods for Speech Recognition" is a classic that explains the mathematical foundations of statistical speech recognition, written by the founder of that area, Frederick Jelinek.

The most important concept you have to know is Hidden Markov Models. People have been using them in speech recognition for decades. A recent approach uses Conditional Random Fields, see the paper (PDF) and the associated software toolkit SCARF.

It is fairly hard to write your own speech recognizer. It's an active research area with several scientific conferences, e.g. ASRU, Interspeech, ICASSP.

+2  A: 

If you're looking for some actual code, check out Sphinx, an open source speech recognition project from CMU. It's not written in C++, but if you're interested in algorithms, it's implemented a bunch of stuff you can learn from. (I'd like to echo @dehmann's point, too: read up on hidden markov models.)

ojrac
+3  A: 

Both are very wide areas. About recognition: In this this schema you will find how to build a basic automatic speech recognition system. It isn't by any means close to the start of the art, but it is something achievable and it works. If you want to do something more advanced, read about cepstral coefficients and Hidden Markov Models. Have a look into HTK, it is a widely used toolkit for Hidden Markov Models.

About text to speech: I'd have a look at Festival.

nacmartin
+2  A: 

There are multiple sphinx's. The main active ones are pocketsphinx and sphinx4.

Sphinx4 is written in Java. It is better for desktop and web applications.

Pocketsphinx is written in C. It is better for embedded devices. There are iphone/android apps that use it.

Sounds like you want pocketsphinx. Try out this tutorial: http://www.speech.cs.cmu.edu/sphinx/tutorial.html

A better place to ask pocketsphinx/sphinx4 questions is on CMU's sourceforge forum.

Also you should provide more info like what you intend to make.

As for books, the bible of speech recognition is "Spoken Language Processing"

ecret
are there any instructions for how to run PocketSphinx on Android? (see this question: http://stackoverflow.com/questions/2920870/pocket-sphinx-on-android)
gregm
A: 

If you are curious about what to do with your fancy speech recognition you should read: Voice Interaction Design by Randy Allen Harris

It provides some great advice about when to use Voice and how to use it in an application.

gregm