views:

194

answers:

1

I am working on improving Festival on Emacs. I need better control of Festival when it reads a sentence. Basically, I need two things:

  1. Show what word is being read.
  2. Change the speed (and maybe pitch) of what is being read.

Ideally, there would be some data structure output by Festival that would link offset/length (usually the start/length of a word) with an output WAV file (or even a location in a wav file). I could then use something like mplayer to build a playlist and somehow tell me when the next word is being played and where that word exists in the buffer.

I'm also hoping there's some simple command to change the speed of what is being read. However, mplayer can do that for me, so it's not a big deal if I can get #1 working.

+1  A: 

See the manual here, especially the part about the "text2wave" script. I'm unclear whether this is a separate executable or just a scheme script that you will have to call. In either case, it looks that it should give you some inspiration for how to do this. It appears to me that you could possibly send a whole buffer to this command, which would generate a .wav file, which you could then control via mplayer. Of course, this would mean you wouldn't know which sentence was currently playing, so you could output each sentence as a .wav file, then queue them up in mplayer (or call mplayer repeatedly). If text2wave is an executable, I'm not sure it's available on Windows, but you should be able to accomplish the same thing with a scheme script for Festival.

Edit: text2wave is indeed a script, but you should be able to easily modify it to call festival with the script as an argument (path/to/festival --script text2wave). I don't know if the Windows binaries include this, but it should be available either from the main Festival site or in a *nix distro (it's definitely in Ubuntu).

Matthew Talbert
I said scheme, but I believe the language used is actually lisp. Should be familiar for an emacs user :)
Matthew Talbert