views:

201

answers:

4

Hi guys,

Is it possible to have an application built using the .NET speech recognition classes and pass in a WAV file for it to go through and create a text representation of it. For example, this what I'm trying to do:

We have a QA department at my office and they have to listen to hundreds of calls a day which is quite impossible, and there's not enough people listening to everything to keep up. What I want to do is have the audio file uploaded to our server and have the server parse it and create a transcript of it. It doesn't matter if it's not perfect, but just a base which would be easier to skim through a couple of dozen lines of text than listen to a 2 hour recording.

Based on a saved transcript I can implement full-text search in the database and also run checks against the transcript if someone is saying something that's a misrepresentation.

So, is it possible to create an application using the .NET speech recognition classes and just pass the WAV file to it and it spit out a rough transcript?

I've dug around MSDN on the Speech classes briefly while thinking up the idea, so I don't have that much knowledge if it's possible to be done.

If possible, I would appreciate any examples in C#. Topic 1055347 is similar to the question I'm having, and was provided links, the most specific of which is in C++. I'm not a C++ developer, nor have I ever went to school for programming, I'm all self though C#, so I would like to stay in the language that I know.

Thanks in advance!

A: 

With areas like speech recognition you are likely to either find a stand alone EXE or an API in c/c++.

For the links in the other topic, you can use a tool like P Interop Assistant to generate C# code. The C# code acts like a wrapper around the unmanaged dll, so you can call it from c#.

This is likely to be the best way to get the functionality you are looking for.

Glenn Condron
A: 

Yes.

I did such an application a few years ago on the Tablet PC; you can read about it at http://web.archive.org/web/20060615192119/www.devx.com/TabletPC/Article/30761 (At the time, I spoke of using Interop to access the libraries, but I believe that the programming model has remained the same, just with a managed wrapper.)

At the time, the results were very poor, but maybe for your use-case better than nothing.

Larry OBrien
+2  A: 

This sounds like you've got a call center type of application. Microsoft Speech Server has a SR engine optimized for telephony (8000 Hz sample rate), which will generate much better recognitions than the desktop SR engine. However, the engine isn't really designed for transcription (although it can do it), and the transcriptions definitely need to be reviewed before further processing occurs. Microsoft Exchange Unified Communications uses the SR engine to generate transcripts of voice mail, and while it's better than nothing, it often generates amusing nonsense.

Eric Brown
A: 

How about route the calls to Google Voice? I'm sure there are similar services. I have been amazed at its accuracy so far, plus you can click and listen to it if required. Google Voice will forward voice calls to SMS or email.

UPDATE: On reread, maybe since you are recording calls it won't work as I yous the voice message left.

kenny