Low sample audio can be used for s2t algorithms. My problems:
- Getting the audio to the server and processing it (flash or Java or something)
- Having the client poll for the required action
- Making this scalable on the backend
- Dealing with the wide range of languages, dialects, inflections and accents that the Internet supports
- Ensuring it is unobstructive and that a usable fallback is available
- dealing with complaints from usability people
There are tons more im sure, but other than that go for it.
Where would we be without people going "we are going to the moon" and then doing it. Go for it ...
If it hasn't already been done, if you fail or succeed then you will probably learn something cool.