views:

11129

answers:

11

I have need to write an application which uses a speech recognition engine -- either the built in vista one, or a third party one -- that can display a word or phrase, and recognise when the user reads it (or an approximation of it). I also need to be able to switch quickly between languages, without changing the language of the operating system.

The users will be using the system for very short periods. The application needs to work without the requirement of first training the recognition engine to the users' voices.

It would also be fantastic if this could work on Windows XP or lesser versions of Windows Vista.

Optionally, the system needs to be able to read information on the screen back to the user, in the user's selected language. I can work around this specification using pre-recorded voice-overs, but the preferred method would be to use a text-to-speech engine.

Can anyone recommend something for me?

+3  A: 

If the engine is what you're asking about then I've found (beware, I'm just listing, I haven't tried any of them):

Lumenvox engine

you also have the SAPI SDK from Microsoft itself, I've only tried it for text to speech but according to its definition:

The SDK also includes freely distributable text-to-speech (TTS) engines (in U.S. English and Simplified Chinese) and speech recognition (SR) engines (in U.S. English, Simplified Chinese, and Japanese).

Jorge Córdoba
The Lumenvox engine looks like it might do the trick! I'm going to have to play a bit with it to be certain. Also need to discuss pricing with the managers. Thanks Jorge!
RichieACC
+1  A: 

Dragon Naturally Speaking SDK might be worth looking at. This project looked interesting.

Haven't got to play with either of them though.

itsmatt
A: 

Text to speech is available with the Speech API. Personally, I'd probably require Vista and use the managed interfaces to System.Speech.SpeechRecognition and System.Speech.Synthesis.TtsEngine, but a P/Invoke should be possible into the unmanaged APIs if you really need XP support.

Mark Brackett
+1  A: 

Be warned that you're not going to get good results if you don't require training first. Speech recognition is a statistical application of phonetics, a field which is pretty frank about the fact that there's so much variation in the signal that it's almost a miracle anyone can understand what anyone else says. An off-the-shelf speech recognition engine will most likely tend towards a more general accent of English, but will fail miserably for anything even slightly different.

That's why training is so important. We can do well by overfitting with ease, especially if we reduce the problem space. But creating an extensible machine learning solution? Therein always lies the rub.

That being says, consider Sphinx-4. It's an off-the-shelf solution written in Java available at http://cmusphinx.sourceforge.net/sphinx4/

Robert Elwell
+18  A: 

A similar question was asked on Joel on Software a while back. You can use the System.Speech.Recognition namespace to do this...with some limitations. Add System.Speech (should be in the GAC) to your project. Here's some sample code for a WinForms app:

public partial class Form1 : Form
{
  SpeechRecognizer rec = new SpeechRecognizer();

  public Form1()
  {
    InitializeComponent();
    rec.SpeechRecognized += rec_SpeechRecognized;
  }

  void rec_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
  {
    lblLetter.Text = e.Result.Text;
  }

  void Form1_Load(object sender, EventArgs e)
  {
    var c = new Choices();
    for (var i = 0; i <= 100; i++)
      c.Add(i.ToString());
    var gb = new GrammarBuilder(c);
    var g = new Grammar(gb);
    rec.LoadGrammar(g);
    rec.Enabled = true;
  }

This recognizes the numbers from 1 to 100, and displays the resulting number on the form. You'll need a form with a label called lblLetter on it.

System.Speech only works with a pre-defined list of words or phrases; it's not exactly NaturallySpeaking, either in versatility or in recognition quality. But you don't have to train it to the user's voice, and if you only have a few different things the user can say, it works reasonably well. And it's free! (if you have Visual Studio)

It won't work well if you use very short phrases; I made a program for my kid to say letters of the alphabet and see them on-screen, but it doesn't do that well since many of the letters sound alike (especially from the mouth of a four-year-old).

As for more flexible options...well, there's the aforementioned NaturallySpeaking, which has an SDK. But you have to contact sales to get any sort of access to it, and no pricing is listed, so it comes across as one of those "How much does it cost? Well, how much have you got?" kind of things. There doesn't seem to be a "download and play around with it" option. :(

As for text-to-speech, System.Speech.Synthesis does this. It's even easier than the speech recognition. I wrote a small program to let me type, hit Enter, and read the text aloud. My four-year-old gets mesmerized by it. :) ("Daddy, I wanna tawk to da wobot.")

Kyralessa
How would I adapt this code to recognise 1 - 100 in French or German without needing to change the OS display language?
RichieACC
The only language possible is the one of your OS. I just read it from MSDN.
Daok
I think your comment that "only works with a pre-defined list of words or phrases" is not true. The desktop recognizer in Vista and later includes a dictation grammar that you can load. See http://msdn.microsoft.com/en-us/library/system.speech.recognition.dictationgrammar.aspx
Michael Levy
+6  A: 

[Note: I was the development lead for the managed speech recognition API in .NET 3.0]

System.Speech is part of .NET 3.0, so it is available on both Vista and XP. In Vista you have the added benefit of having a speech recognition engine pre-installed by the OS. On XP you choices are: use the SAPI 5.1 SDK with a very old engine (but might work well enough for your command and control scenario), install Office 2003 which installs a newer version of the recognizer. There are a few SAPI 5 complient speech recognition engines available as well.

If you need to switch languages, you will want to use the System.Speech.Recognition.SpeechRecognitionEngine class which allows you to choose the SR engine for the language you need to support. Note that engines are defined by a set of languages they support (they might be using the same binary, only swapping data files to support additional languages).

Comment if you need to know more.

Philipp

Philipp Schmid
Phillip, if I want to use the engine and train it to learn and recognize spoken Croatian, as a way for transcribing various speakers, is it possible, and if is, where to start?
Daniel Mošmondor
There are 2 parts to a speech recognizer: acoustic models and language model. You can use the Vista Dictation Resource Kit (or something like that) to build a dictation language model that references Croatian words. There are currently no tools to train the acoustic models which you would want to do if there are sounds in Croatian that are not present in English (or whatever existing SR language you are using). You can specify custom pronunciations for your Croatian words to improve your recognition accuracy.
Philipp Schmid
A: 

Try Microsoft Speech Server, which I think now is part of Office Communication Server 2007. It contains a SR/TTS engines, C# API and tools that integrate with Visual Studio.

dbkk
+5  A: 

Found that the code example posted by Kyralessa on Oct 22nd didn't work for me but a slightly revised version did. When adding strings into the Choices object use full text English words not numbers. Seems the MS speech recognition engine can't recognize numbers by themselves.

I have marked these modifications with some commenting added to the previous example.

public partial class Form1 : Form
{
  SpeechRecognizer rec = new SpeechRecognizer();

  public Form1()
  {
    InitializeComponent();
    rec.SpeechRecognized += rec_SpeechRecognized;
  }

  void rec_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
  {
    lblLetter.Text = e.Result.Text;
  }

  void Form1_Load(object sender, EventArgs e)
  {
    var c = new Choices();

    // Doens't work must use English words to add to Choices and
    // populate grammar.
    //
    //for (var i = 0; i <= 100; i++)
    //  c.Add(i.ToString());

    c.Add("one");
    c.Add("two");
    c.Add("three");
    c.Add("four");
    // etc...

    var gb = new GrammarBuilder(c);
    var g = new Grammar(gb);
    rec.LoadGrammar(g);
    rec.Enabled = true;
  }
Rob Segal
Not sure why it didn't work for you; the code I posted came directly from a program I wrote and used, and it worked for me. Perhaps it's related to the culture settings on your system?
Kyralessa
Could be. I didn't look into this extremely in depth.
Rob Segal
+1  A: 

There are many open source available on www.codeproject.com. Try that.

A: 

This is the article from MSDN magazine that first discussed using the System.Speech APIs for Vista. Some of it is out of date because the API changed between beta (when the article was written) and the release of Vista, but this is still one of the best resources I've found and covers a good intro to the System.Speech namespace. See http://msdn.microsoft.com/en-us/magazine/cc163663.aspx

Michael Levy