views:

25

answers:

1

I'm using C# .net library System.Speech to implement my ASR app ( BTW, I've seen a post mentioned the SpeechLib.dll, which seems to be a more basic and low-level implementation of the SAPI, are they the same?). Our main purpose is to implement as the Server/Client ASR system : to record user's voice on the client, and transfer the whole audio stream to the server via internet, and the sever process the ASR job and return the result to the client.

And I've written a similar app, which is using the local mic as the voice input and it performed pretty well.

my origin app:


SpeechRecgonitionEngine sr = new  SpeechRecgonitionEngine();

sr.SetInputToDefaultDevice();

sr.RecognizeAsync();

In this way, I used the mic for input, and the accuracy of the result show pretty good.

And here's the problem. Now turn to the new task, which I have to set the recognition input to a WavFile(or a audioStream via the TCP/IP socket connection). So I just simply changed my code to this way:


SpeechRecgonitionEngine sr = new  SpeechRecgonitionEngine();

sr.SetInputToWaveFile(@"D:\input.wav");

sr.RecognizeAsync();

the result turn to be unsatisfactory. I just pre-record some wave snippets to several files seperately, base on the same grammar of the mic-input app, and set these files as the ASR input. However, only some files can be detected(handled by SpeechDectectedEvent), and very few files can be well recognized(handled by SpeechRecognizedEvent). I just record the same phrase as to the mic-input app.

Despite for the poor accuracy, some files can be recognized correctly which indicates my code don't have any logic error. But I assumed that I miss some job before i use it, such as setup some parameters of the recognizer.

So I'm here to ask for help, if anyone know the reason of the poor accuracy using wavfile-input?

Thanks!!!!

A: 

SpeechLib.dll is the COM interop library for the native COM interface (SAPI). SpeechRecognitionEngine is the friendly .NET class wrapper for it. They both access the exact same recognition engine.

There's probably some kind of problem with your recording. Usually a volume issue, like clipping (too loud) or too much noise (too soft). Get some basic diagnostics by implementing the AudioSignalProblemOccurred event.

Hans Passant
thks a lot! I've tried this, and it did turn to AudioSignalProblemOccurred the handler when the file cannot be recognized.however I'm change back to mic-input ASR, also observe the result of the AudioSignalProblemOccurred(ASPO) event. I found that due to my voice through mic-input is continuous, and some times at the beginning of my phrase invoke a ASPO, however after my phrase been spoken completely the recognizer can understand it. So i think it's the reason why mic-input has a high accuracy.So I wanna know, how can I modify my WAV-input ASR to act like the mic-input, which can re-adjust
JXITC
Actually, I solve it by change the sample rate from 22000Hz(which is default setted by my recording app) to 16000Hz(which is the default setting for the recognizer). The accuracy turn to be normal ! haha
JXITC