I'm using C# .net library System.Speech to implement my ASR app ( BTW, I've seen a post mentioned the SpeechLib.dll, which seems to be a more basic and low-level implementation of the SAPI, are they the same?). Our main purpose is to implement as the Server/Client ASR system : to record user's voice on the client, and transfer the whole audio stream to the server via internet, and the sever process the ASR job and return the result to the client.
And I've written a similar app, which is using the local mic as the voice input and it performed pretty well.
my origin app:
SpeechRecgonitionEngine sr = new SpeechRecgonitionEngine();
sr.SetInputToDefaultDevice();
sr.RecognizeAsync();
In this way, I used the mic for input, and the accuracy of the result show pretty good.
And here's the problem. Now turn to the new task, which I have to set the recognition input to a WavFile(or a audioStream via the TCP/IP socket connection). So I just simply changed my code to this way:
SpeechRecgonitionEngine sr = new SpeechRecgonitionEngine();
sr.SetInputToWaveFile(@"D:\input.wav");
sr.RecognizeAsync();
the result turn to be unsatisfactory. I just pre-record some wave snippets to several files seperately, base on the same grammar of the mic-input app, and set these files as the ASR input. However, only some files can be detected(handled by SpeechDectectedEvent), and very few files can be well recognized(handled by SpeechRecognizedEvent). I just record the same phrase as to the mic-input app.
Despite for the poor accuracy, some files can be recognized correctly which indicates my code don't have any logic error. But I assumed that I miss some job before i use it, such as setup some parameters of the recognizer.
So I'm here to ask for help, if anyone know the reason of the poor accuracy using wavfile-input?
Thanks!!!!