views:

54

answers:

1

A little left field, but I'm trying to train a speech recognition program and the guidelines suggest that I attempt to speak clearly but naturally. I notice, however, that when one speaks naturally each word tends to drift into the next, resulting in a rather ambiguous boundary between the words.

One the one hand, speaking in a more stilted manner would seem to aid the computer in recognising the phonemes, but on the other it would tend to make it less likely to understand more natural speech.

Anyone knowledgeable in the field out there who can suggest which of the two approaches is more effective?

Thanks

+1  A: 

Continuous-speech recognition is a different and more difficult problem than "discrete dictation" (the problem an IBM Research member of which I was a very junior member cracked about a quarter century ago;-). If "discrete" speech is acceptable for the given application, it's sure to give you higher recognition rates (will never confuse "recognize speech" with "wreck a nice beach";-). If it's absolutely not acceptable, however, then you should not use it (by definition of "absolutely" and "not acceptable";-).

Alex Martelli
Interesting article: http://robertfortner.posterous.com/the-unrecognized-death-of-speech-recognition
TrueWill
François