views:

160

answers:

2

Hi, I'm trying to develop an online application where the user writes some text and the software sings it back to the user.

I can currently generate the audio file with the words spoken by the computer using espeak, but I have no idea how to make it sound like a song, how to add rhythm to it.

I'm able to change the pitch and tempo using rubberband, but that's as far as I've gotten.

Does anyone have a clue how to make this happen?

A: 

If you want to use rubberband to change duration and pitch, then I think the hard part is going to be mapping from phonemes/syllables in the text to corresponding audio ranges in the speech systhesis output, for which I have no simple suggestion. (Ideally you'd get inside the speech synthesiser so that it would provide you with the mapping from phonemes to audio location.)

A simpler alternative might be to try Speech Synthesizer Markup Language - SSML. It has a "pitch" and "duration" elements that can absolutely specify pitch in Hz and duration in seconds. You can also specify volume, for controlling dynamics.

Given this, you could try to convert the text into a SSML document, and mark up words/syllables/phonemees with pitch/duration and volume attributes.

mdma
Which SSML engine(s) can process at that level ?
Jim Rush
A: 

I've ended up using Festival's singing mode. It sounds reasonably well, except for the fact it only works with English voices.

Ofir