tags:

views:

87

answers:

1

I'm working on an openGL project that involves a speaking cartoon face. My hope is to play the speech (encoded as mp3s) and animate its mouth using the audio data. I've never really worked with audio before so I'm not sure where to start, but some googling led me to believe my first step would be converting the mp3 to pcm.

I don't really anticipate the need for any Fourier transforms, though that could be nice. The mouth really just needs to move around when there's audio (I was thinking of basing it on volume).

Any tips on to implement something like this or pointers to resources would be much appreciated. Thanks!

-S

+2  A: 

Whatever you do, you're going to need to decode the MP3s into PCM data first. There are a number of third-party libraries that can do this for you. Then, you'll need to analyze the PCM data and do some signal processing on it.

Automatically generating realistic lipsync data from audio is a very hard problem, and you're wise to not try to tackle it. I like your idea of simply basing it on the volume. One way you could compute the current volume is to use a rolling window of some size (e.g. 1/16 second), and compute the average power in the sound wave over that window. That is, at frame T, you compute the average power over frames [T-N, T], where N is the number of frames in your window.

Thanks to Parseval's theorem, we can easily compute the power in a wave without having to take the Fourier transform or anything complicated -- the average power is just the sum of the squares of the PCM values in the window, divided by the number of frames in the window. Then, you can convert the power into a decibel rating by dividing it by some base power (which can be 1 for simplicity), taking the logarithm, and multiplying by 10.

Adam Rosenfield