You need the equivalent of a browser that knows how to process VoiceXML.
I work in telecom industry, so that usually means software that connects to the public phone network through either one of the old style telephony connections or via VoIP. There are many commercial and some open source solutions in this area.
There are some other implementations, like Opera and some research initiatives in the accessibility area, but I haven't seen them gather much ground.
I wouldn't look at VoiceXML as the easiest way to approach speech recognition. That said, there aren't easy ways nor many free/open source solutions. The easiest path on a Microsoft platform would be to look at Microsoft's SAPI layer and the free, minimal ASR they provide. On the Linux side, check out CMU Sphinx.