Where can I learn how to work with audio data formats? | ansaurus

tags:

views:

87

answers:

1

Q:

Where can I learn how to work with audio data formats?

I'm working on an openGL project that involves a speaking cartoon face. My hope is to play the speech (encoded as mp3s) and animate its mouth using the audio data. I've never really worked with audio before so I'm not sure where to start, but some googling led me to believe my first step would be converting the mp3 to pcm.

I don't really anticipate the need for any Fourier transforms, though that could be nice. The mouth really just needs to move around when there's audio (I was thinking of basing it on volume).

Any tips on to implement something like this or pointers to resources would be much appreciated. Thanks!

-S

+2 A:

Whatever you do, you're going to need to decode the MP3s into PCM data first. There are a number of third-party libraries that can do this for you. Then, you'll need to analyze the PCM data and do some signal processing on it.

Automatically generating realistic lipsync data from audio is a very hard problem, and you're wise to not try to tackle it. I like your idea of simply basing it on the volume. One way you could compute the current volume is to use a rolling window of some size (e.g. 1/16 second), and compute the average power in the sound wave over that window. That is, at frame T, you compute the average power over frames [T-N, T], where N is the number of frames in your window.

Thanks to Parseval's theorem, we can easily compute the power in a wave without having to take the Fourier transform or anything complicated -- the average power is just the sum of the squares of the PCM values in the window, divided by the number of frames in the window. Then, you can convert the power into a decibel rating by dividing it by some base power (which can be 1 for simplicity), taking the logarithm, and multiplying by 10.

Adam Rosenfield 2009-04-01 05:08:16

related questions

How to emulate/replace/re-enable classical Sound Mixer controls (or commands) in Windows Vista? [answered]

Music - How do you analyse the fundamental frequency of a PCM or WAC sample

Convert WAV to WMA using .NET

How does one record audio from a Javascript based webapp?

What is the best way to merge mp3 files?

Slowing down the playback of an audio file without changing its pitch?

Creating MP4/M4A files with Chapter marks

Algorithm to decide if digital audio data is clipping?

Service to make an audio podcast from a video one?

Good python library for generating audio files?

How to do a sample rate conversion in Windows (and OSX)

Waveform Visualization in Ruby

Simple audio input API on a Mac?

Change Active Sound Card on the Fly

What Are High-Pass and Low-Pass Filters?

3.1 or 5.1 audio in Flash

Can an audio object be embedded in an InfoPath form ?

Must-see tech talks/presentations?

Free Wavetable Synthesizer?

How do I search content, within audio files/streams?

What is a good free library for editing MP3s/FLACs?

Detecting audio silence in WAV files using C#

Accessing audio/video metadata with .NET

Transcoding audio and video

Rockbox audio format