tags:

views:

73

answers:

3

Hi, I'm writing a file compressor utility in C++ that I want support for PCM WAV files, however I want to keep it in PCM encoding and just convert it to a lower sample rate and change it from stereo to mono if applicable to yield a lower file size.

I understand the WAV file header, however I have no experience or knowledge of how the actual sound data works. So my question is, would it be relatively easy to programmatically manipulate the "data" sub-chunk in a WAV file to convert it to another sample rate and change the channel number, or would I be much better off using an existing library for it? If it is, then how would it be done? Thanks in advance.

A: 

I don't think there's really the need of reinventing the wheel (unless you want to do it for your personal learning). For instance you can try to use libsnd

nico
Generally, I'm one for minimalism in my programs if possible, and it would be nice to know a bit of the format. To me it seems like changing the sample rate and channels would be relatively easy to do, but of course I can be wrong.
kaykun
Your program wont be so "minimal" once you roll your own audio processing code. If you use a library, chances are you already have it anyways and takes a few lines of code to call.
Longpoke
+4  A: 

PCM merely means that the value of the original signal is sampled at equidistant points in time.

For stereo, there are two sequences of these values. To convert them to mono, you merely take piecewise average of the two sequences.

Resampling the signal at lower sampling rate is a little bit more tricky -- you have to filter out high frequencies from the signal so as to prevent alias (spurious low-frequency signal) from being created.

avakar
+1  A: 

I agree with avakar and nico, but I'd like to add a little more explanation. Lowering the sample rate of PCM audio is not trivial unless two things are true:

  1. Your signal only contains significant frequencies lower than 1/2 the new sampling rate (Nyquist rate). In this case you do not need an anti-aliasing filter.

  2. You are downsampling by an integer value. In this case, downampling by N just requires keeping every Nth sample and dropping the rest.

If these are true, you can just drop samples at a regular interval to downsample. However, they are both probably not true if you're dealing with anything other than a synthetic signal.

To address problem one, you will have to filter the audio samples with a low-pass filter to make sure the resulting signal only contains frequency content up to 1/2 the new sampling rate. If this is not done, high frequencies will not be accurately represented and will alias back into the frequencies that can be properly represented, causing major distortion. Check out the critical frequency section of this wikipedia article for an explanation of aliasing. Specifically, see figure 7 that shows 3 different signals that are indistinguishable by just the samples because the sampling rate is too low.

Addressing problem two can be done in multiple ways. Sometimes it is performed in two steps: an upsample followed by a downsample, therefore achieving rational change in the sampling rate. It may also be done using interpolation or other techniques. Basically the problem that must be solved is that the samples of the new signal do not line up in time with samples of the original signal.

As you can see, resampling audio can be quite involved, so I would take nico's advice and use an existing library. Getting the filter step right will require you to learn a lot about signal processing and frequency analysis. You won't have to be an expert, but it will take some time.

Jason