Creating a new voice for a text-to-speech engine is a complex process. It is not just a matter of getting a voice artist to record audio and simply creating a voice from that. There is a lot of work that goes into this (segmenting the audio into phonemes; building the voice data; building the dictionary; getting the prosody and audio joining/synthesizing rules correct).
For a voice engine like the Microsoft Text-to-Speech engine, you are also facing the problem that the voice format is proprietary and so you cannot create new voices in that format. You are also limited by the capabilities of the engine.
Your best bet at the moment is either:
- switching to using the eSpeak text-to-speech engine and using espeakedit to create your own voice (contacting the developer for help with this) -- this engine uses a synthesis method that makes it sound similar to the Microsoft's and the voice Stephen Hawking is using, but they are very clear and the pronunciation is on the whole very good;
- using a different text-to-speech engine like Cepstral that use voice recordings (these tend to sound more human-like, but I have found that the prosody is not very good, ruining the resulting audio);
- using the service from Cepstral to create a voice specific for your needs (which is likely to be expensive).
I am looking at using the audio data from librivox.org to generate text-to-speech voices from. This is likely 3-4 years away though, before I have anything close to being functional.