What's the rationale (in term of
usability) behind it versus a keyboard
or keypad?
Usability is a very broad term. If I were to attempt to enter my address with a touch pad, it wouldn't be considered very usable. Some argue that using a speech engine with an overall success rate of 70-80% isn't very usable either. As indicated in other posts, hands free input can be much easier for those on a mobile phone. However, using words versus numeric input can actually be less intuitive than a touch tone phone if the topic is somewhat foreign to the caller. A caller hearing terms and phrases that aren't very familiar can't remember them in the 10-30 seconds of the prompt but they can hover over the best sounding choice with their finger or remember the order of choices.
What reasons would you have
to invest in this development?
This is an odd question. Usually the decision to use speech or not in an IVR environment is not driven from the development view of the world. Unless you have a specific requirement that really requires speech, you are almost always reducing overall success rates. Speech is usually a factor of corporate image ... or having the latest technological toy.
I guess it's a plus when you're in your car without holding your phone
but is it worth the additional waiting time?
Speech recognition latencies aren't very high these days when using modern ASRs. In most cases, input is handled in parallel with speech and time between end of speech recognition is .5 to 1s. Be aware that many IVRs then need to perform data look-ups after some inputs and this can appear as a slower system. Normal inputs pushing beyond 1s is usually the sign of an under-powered deployment.
It may not have been under-powered when original implemented, but through tuning efforts, you make a lot of performance versus accuracy decisions. To get that next .1%, resources can be pushed beyond what they should be at peak.
Also, reliability is better than it was, definitely,
but sometime it feels more like an toy someone decided
to plugged into the system so it can feel futuristic.
In general, yes. On the reliability note, you need to really look at the overall numbers to get a sense of the system. It is a battle of statistics where the individual isn't very important (unless they hold the title of VP or above). Through optimization of the input (shifting prompting), resource usage and other speech reco tuning parameters you attempt to maximize accuracy. For basic natural language responses, you can get in the upper 90s. However, your overall success rate is much lower. Imagine 5 prompts all at 98% (in reality, you tend to have a bunch 99 and then a few mid 90s or slightly below): .98 * .98 * .98 * .98 * .98 = 90%. That means 1 out of 10 failing. That is before caller confusion and business rules. DTMF input is usually very near 100%, even after several inputs.
Any experience designing IVR or software that
used (or chose not to) speech recognition?
Yes. But, I suspect that really isn't the question you want. As someone on the technology side, this is usually not your decision and you have limited influence on it. If you are really looking for the pros/cons of speech:
Pros:
- Cool/hip (note, speech alone isn't sufficient. You need a great VUI and voice talent)
- Good for a highly mobile crowd that shuns ear pieces. The future is supposed to be blending speech with tactile input. Maybe. It probably won't come from the IVR side of the market.
- Good for tasks that can't be done with DTMF. Note, many of these problems tend to have low success rates in speech as well. Cost (versus humans) is usually the driving factor not usability. Dropping a call into a voicemail box for things like address change can be very cost effective.
Cons:
- Expensive to development, deploy and maintain. Adding new choices can have a significant impact on success rates if you aren't careful. Always monitor the impact of change.
- Is often deployed inappropriately. For example, just say your numeric menu choice. This is nearly often a case of we want speech coolness, but can't afford what it really takes to achieve speech coolness.
- Success rates will be lower and therefore call center costs will be higher.
- Failures tend to focus on specific prompts and individual callers. A caller that regularly experiences problems with your system will be very unhappy with you.
- Callers get angry when they aren't understood. Is your goal to identify a subset of your customer base and really get them angry ?