Transcribing WMA/MP3 audio in an automated fashion?

SAPI can certainly do what you want. Start with an in-proc recognizer, connect up your audio as a file stream (you'll probably need to transcode your WMA files to a WAV stream, as SAPI only takes WAV input, but you can do the transcoding on the fly), set dictation mode, and off you go.

Now the disappointing bit. You probably won't get terribly good results; in fact, I suspect that unless you're very lucky, you'll probably get total garbage.

There are several problems:

Dictation really only works well once the SR engine has been trained. If you're lucky (like me), you can get OK results, but if the speaker has an accent, training is a must.
Training only works well for a single voice. If you've got multiple speakers in a single audio file, it's not going to work well.
The audio model for dictation (and Speech Recognition in general) assumes that you're using a close-talk microphone (i.e., a microphone right next to your face, to minimize noise pickup). If your WMA files have extra noise, accuracy will go down dramatically.

I actually would suggest using Dragon Naturally Speaking Professional; they've spent the time and money to make transcription work. I haven't used it myself, so I don't know how well it would work in your situation.

SAPI covers both recognition and synthesis, so it's certainly possible that it can be used. I'm not familiar with it, though, so I can't say if Windows actually provides access to built-in recognition for English even on a non-English OS. It might still provide enough to get you started, though.

Michael Madsen 2009-09-30 18:42:14

Oh, didn't know that. Only remebered the thing that is in XP, now that you say it, Vista has this recognision feature.

Femaref 2009-09-30 21:43:03

I did a bit of research on Dragon Naturally Speaking, and the transcription tool assumes that it's taking its input from a voice recorder or similar tool, so it has a similar set of problems (it requires training, assumes a single voice, and assumes the microphone is close to the speaker).

Eric Brown 2009-11-10 23:16:17

That is true, but the Dragon engine has been used successfully for "Audio Mining" before. If you need an accurate transcript, you will be disappointed. If you want to find keywords or phrases, on a reasonable quality audio source (like TV, not a phone conference recording) it works. This was a number of years ago, but I'm sure it hasn't gotten worse.

Mike Elkins 2009-11-19 16:55:50

ansaurus

tags:

views:

answers:

Transcribing WMA/MP3 audio in an automated fashion?

related questions