views:

13

answers:

1

I have an audio stream and I would extract words (speech) from it. So for example having audio.wav I would get 001.wav, 002.wav, 003.wav, etc where each XXX.wav is one word.

I am looking for a library or program to do it -- platform does not matter, but I prefer open-source solution.

Thank you in advance for help.

+1  A: 

Nuance, the company that makes Dragon Naturally Speaking, has a number of Software Development Kits.

The Audio Mining kit seems to match your requirements:

Dragon NaturallySpeaking SDK AudioMining is a speaker-independent speech recognition toolkit that enables the indexing of 100% of the speech information within audio files. The technology uses highly accurate speech recognition to turn audio files into XML text with timestamp information. This can be integrated with standard text-search products to enable rapid access to specific audio content.

The speech to speech+metadata is far and away the hardest part to get right. Once you have the speech + metadata, extracting the words as individual audio files is much more straightforward.

mattbh