views:

2016

answers:

6

Using Microsoft's SAPI 5.3 Speech API on Vista, how do you programatically do acoustic model training of a RecoProfile? More concretely, if you have a text file, and an audio file of a user speaking that text, what sequence of SAPI calls would you make to train the user's profile using that text and audio?

+1  A: 

More information about this problem I still haven't solved: You call ISpRecognizer2.SetTrainingState( TRUE, TRUE ) at "the beginning" and ISpRecognizer2.SetTrainingState( FALSE, TRUE ) at "the end." But it is still unclear just when those actions have to happen relative to other actions.

For example, you have to make various calls to set up a grammar with the text that matches your audio, and other calls to hook up the audio, and other calls to various objects to say "you're good to go now." But what are the interdependencies -- what has to happen before what else? And if you're using an audio file instead of the system microphone for input, does that make the relative timing less forgiving, because the recognizer isn't going to keep sitting there listening until the speaker gets it right?

markab
+1  A: 

Hello, I am tryng to train my speech engine by passing the wav file using the following code in asp.net 3.5 and vista OS.

Code:

            recoClass = new SpeechLib.SpInprocRecognizerClass();
            recoCxt = recoClass.CreateRecoContext();
            recoClass.SetTrainingState(1, 1);
            recoCxt.SetAdaptationData("the technology that allows us to control computers via speech is still very new");
            input = new SpeechLib.SpFileStreamClass();
            input.Open("C:\\Sample1.wav", SpeechLib.SpeechStreamFileMode.SSFMOpenForRead, false);
            recoClass.AudioInputStream = input;
            recoCxt.State = SpeechLib.SpeechRecoContextState.SRCS_Enabled;
            recoClass.SetTrainingState(0, 1);
            Console.ReadKey();

If anyone solved the problem, please post your solution here...

Thanks in advance...

+1  A: 

Can anyone tell me how to add my own trainings files into default SAPI trainings?

I want to add my own training scripts which are written in English but has meaning in other languge, like proper nouns.(Name, places). I want to add such script plus I also wants to add my custom words in SAPI 5.1.

I have successfully loaded an xml file without exception but the SAPI isn't recognizing my custom words

Is there any support in SAPI for Arabic language i.e SAPI recognize Arabic words and display it in English as an meaning less words in English.

+5  A: 

Implementing SAPI training is relatively hard, and the documentation doesn’t really tell you what you need to know.

ISpRecognizer2::SetTrainingState switches the recognizer into or out of training mode.

When you go into training mode, all that really happens is that the recognizer gives the user a lot more leeway about recognitions. So if you’re trying to recognize a phrase, the engine will be a lot less strict about the recognition.

The engine doesn’t really do any adaptation until you leave training mode, and you have set the fAdaptFromTrainingData flag.

When the engine adapts, it scans the training audio stored under the profile data. It’s the training code’s responsibility to put new audio files where the engine can find it for adaptation.

These files also have to be labeled, so that the engine knows what was said.

So how do you do this? You need to use three lesser-known SAPI APIs. In particular, you need to get the profile token using ISpRecognizer::GetObjectToken, and SpObjectToken::GetStorageFileName to properly locate the file.

Finally, you also need to use ISpTranscript to generate properly labeled audio files.

To put it all together, you need to do the following (pseudo-code):

Create an inproc recognizer & bind the appropriate audio input.

Ensure that you’re retaining the audio for your recognitions; you’ll need it later.

Create a grammar containing the text to train.

Set the grammar’s state to pause the recognizer when a recognition occurs. (This helps with training from an audio file, as well.)

When a recognition occurs:

Get the recognized text and the retained audio.

Create a stream object using CoCreateInstance(CLSID_SpStream).

Create a training audio file using ISpRecognizer::GetObjectToken, and ISpObjectToken::GetStorageFileName , and bind it to the stream (using ISpStream::BindToFile).

Copy the retained audio into the stream object.

QI the stream object for the ISpTranscript interface, and use ISpTranscript::AppendTranscript to add the recognized text to the stream.

Update the grammar for the next utterance, resume the recognizer, and repeat until you’re out of training text.

Eric Brown
A: 

Hi Markab, Just wondering whether you could reach anywhere with this particular issue. I have tried to work on the suggestions on this page but could't come up with a successful solution. Many thanks.

tkm
A: 

I'm happy to paypal $50 if someone could post a solution here.

daveodonoghue