Dave's suggestions are a great start. Sphinx is very nifty.
I just want to add that you should be as probabilistic as possible. As a one-time linguist and even earlier one-time phonology buff, I can confidently say don't get caught up with linguistic models. Let's not forget the oft misattributed "every time I fire a linguist my accuracy goes up". It's really about the model and its capabilities to account for noise and variation rather than anything a liberal arts major from MIT has to say.
A good book to pick up would be Jurafsky and Martin's "Speech and Language Processing". It has some very useful applications of computational models for the task. Harvey Sussman's work on linear correlates in the F2 slopes for a variety of vowels (starting with barn owls and working its way towards humans) seems like it would be a nice thing to implement in a model one of these days.