+1  A: 

One answer for all 3 questions: Look at the SSML specifications: http://www.w3.org/TR/speech-synthesis/

For example, to specify emphasis, you use the emphasis element, e.g.

<?xml version="1.0"?>
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
         xml:lang="en-US">
  That is a <emphasis> big </emphasis> car!
  That is a <emphasis level="strong"> huge </emphasis>
  bank account!
</speak>
Android Eve
Thanks, but I don't understand how to use your example in this context. Based (loosely) on your suggestion, I tried calling mTts.speak("ha ha ha <emphasis>ha</emphasis> ha ha", TextToSpeech.QUEUE_ADD, null), but the output contained a reading of the emphasis tags.Are you saying that passing an XML document string as the first argument to TextToSpeech.speak() will cause the TTS engine to read the XML body and control the reading with the parsed tags?
gregS
A: 

JW answered my question at the tts-for-android group:

Hi Greg,

The Pico engine recognizes the tag with the XSAMPA alphabet.

There are no easy rules to derive a certain pronunciation from the orthograpy, but you can use intuitive spellings and trial and error. Capitalizing and hyphens will introduce more problems than solving them. Using different spellings and introducing extra word boundaries (spaces) can work.

The emphasis tag and the exclamation mark will not change the synthesis result. Use , , and commands instead.


Some examples of the proper syntax for specifying the pronunciation using the SSML phoneme tag are in these tests of TextToSpeech.

Even with these simple test SSML documents, there are warning messages posted to logcat about the SSML document not being well-formed. So I opened an issue about these seemingly incorrect logcat messages to the Android issue tracker.


The syntax for specifying an x-SAMPA sequence to SVOX pico is

            String text = "<speak xml:lang=\"en-US\"> <phoneme alphabet=

\"xsampa\" ph=\"d_ZIn\"/>."; mTts.speak(text, TextToSpeech.QUEUE_ADD, null);

Although more examples would be helpful, a good reference for x-SAMPA is at http://en.wikipedia.org/wiki/Xsampa If I compile a couple dozen examples, I'll post them to that Wikipedia page.

gregS