views:

183

answers:

2

Hello every body,

I work on text-to-speech trasforming text, in audio mp3 files, using python 2.5.

I use pyTSS as a python Text-To-Speech module, to transform text in audio .wav files (in pyTTS is not possible to encode in mp3 format directly). So after that, I code these wav files, in mp3 format, using lame command line encoder.

Now, the problem is that, I would like to insert (in particular point of an audio mp3 file, between two words) a particular external sound file (like a sound warning) or (if possible a generated warning sound).

Questions are:

1) I have seen that PyTTS have possibilities to save audio stream on a file or in a memory stream. using two function:

tts.SpeakToWave(file, text) or tts.SpeakToMemory(text)

Exploiting tts.SpeakToMemory(text) function, and using PyMedia I have been able to save an mp3 directly but mp3 file (when reproducing), sounds uncomprensible like donald duck! :-) Here a snippet of code:

            params = {'id': acodec.getCodecID('mp3'), 'bitrate': 128000, 'sample_rate': 44100, 'ext': 'mp3', 'channels': 2}

            m = tts.SpeakToMemory(p.Text)
            soundBytes = m.GetData()

            enc = acodec.Encoder(params)

            frames = enc.encode(soundBytes)
            f = file("test.mp3", 'wb')
            for frame in frames:
                f.write(frame)
            f.close()

I can not understand where is the problem?!? This possibility (if it would work correctly), it would be good to skip wav files transformation step.

2) As second problem, I need to concatenate audio mp3 file (obtained from text-to-speech module) with a particular warning sound.

Obviously, it would be great if I could concatenate audio memory streams of text (after text-to-speech module) and the stream of a warning sound, before encoding the whole audio memory stream in an unique mp3 file.

I have seen also that tksnack libraries, can concatenate audio, but they are not able to write mp3 files.

I hope to have been clear. :-)

Many thanks to for your answers to my questions.

Giulio

+1  A: 

I don't think PyTTS produces default PCM data (i.e. 44100 Hz, stereo, 16-bit). You should check the format like this:

memStream = tts.SpeakToMemory("some text")
format = memStream.Format.GetWaveFormatEx()

...and hand it over correctly to acodec. Therefore you can use the attributes format.Channels, format.BitsPerSample and format.SamplesPerSec.

As to your second question, if the sounds are in the same format, you should be able to simply pass them all to enc.encode, one after another.

AndiDog
A: 

Hi - can't provide a definitive answer here, sorry. But there is some trial and error: I'd look at the docuemtation of the pymedia module to check if tehre are any quality configurations that you can set.

And the other thign is that unlike wave or raw audio, you won't be able to simply concatenate mp3 encoded audio: whatever the solution you reach, you will have to concatenate/mix your sounds while they are uncompressed (unencoded), and afterwards generate the mp3 encoded audio.

Also, sometimes we just have the feeling that recordign a fiel to disk and reconvertignit, instead of doing it in "one step" is awkward - while in pratie, the software does exsactly that behind the scenes,even if we don't specify a file ourselves. If you are on a Unix-like system you can always create a FIFO special file (with the mkfifo command) and send yoru .wav data there for encodin in a separate process (using lame): for your programs it will look like you are using an intermediate file, but you actually won't.

jsbueno