I am using System.Speech.Synthesis.SpeechSynthesizer to convert text to speech. And due to Microsoft's anemic documentation (see my link, there's no remarks or code examples) I'm having trouble making heads or tails of the difference between two methods:
SetOutputToAudioStream and SetOutputToWaveStream.
Here's what I have deduced:
SetOutputToAudioStream takes a stream and a SpeechAudioFormatInfo instance that defines the format of the wave file (samples per second, bits per second, audio channels, etc.) and writes the text to the stream.
SetOutputToWaveStream takes just a stream and writes a 16 bit, mono, 22kHz, PCM wave file to the stream. There is no way to pass in SpeechAudioFormatInfo.
My problem is SetOutputToAudioStream doesn't write a valid wave file to the stream. For example I get a InvalidOperationException ("The wave header is corrupt") when passing the stream to System.Media.SoundPlayer. If I write the stream to disk and attempt to play it with WMP I get a "Windows Media Player cannot play the file..." error but the stream written by SetOutputToWaveStream plays properly in both. My theory is that SetOutputToAudioStream is not writing a (valid) header.
Strangely the naming conventions for the SetOutputTo*Blah* is inconsistent. SetOutputToWaveFile takes a SpeechAudioFormatInfo while SetOutputToWaveStream does not.
I need to be able to write a 8kHz, 16-bit, mono wave file to a stream, something that neither SetOutputToAudioStream or SetOutputToWaveStream allow me to do. Does anybody have insight into SpeechSynthesizer and these two methods?
For reference, here's some code:
Stream ret = new MemoryStream();
using (SpeechSynthesizer synth = new SpeechSynthesizer())
{
synth.SelectVoice(voiceName);
synth.SetOutputToWaveStream(ret);
//synth.SetOutputToAudioStream(ret, new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
synth.Speak(textToSpeak);
}
Solution:
Many thanks to @Hans Passant, here is the gist of what I'm using now:
Stream ret = new MemoryStream();
using (SpeechSynthesizer synth = new SpeechSynthesizer())
{
var mi = synth.GetType().GetMethod("SetOutputStream", BindingFlags.Instance | BindingFlags.NonPublic);
var fmt = new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono);
mi.Invoke(synth, new object[] { ret, fmt, true, true });
synth.SelectVoice(voiceName);
synth.Speak(textToSpeak);
}
return ret;
For my rough testing it works great, though using reflection is a bit icky it's better than writing the file to disk and opening a stream.