views:

3003

answers:

9

I have two raw sound streams that I need to add together. For the purposes of this question, we can assume they are the same bitrate and bit depth (say 16 bit sample, 44.1khz sample rate).

Obviously if I just add them together I will overflow and underflow my 16 bit space. If I add them together and divide by two, then the volume of each is halved, which isn't correct sonically - if two people are speaking in a room, their voices don't become quieter by half, and a microphone can pick them both up without hitting the limiter.

  • So what's the correct method to add these sounds together in my software mixer?
  • Am I wrong and the correct method is to lower the volume of each by half?
  • Do I need to add a compressor/limiter or some other processing stage to get the volume and mixing effect I'm trying for?
+2  A: 

If you need to do this right, I would suggest looking at open source software mixer implementations, at least for the theory.

Some links:

Audacity

GStreamer

Actually you should probably be using a library.

krusty.ar
A: 

I'd say just add them together. If you're overflowing your 16 bit PCM space, then the sounds you're using are already incredibly loud to begin with and you should attenuate them. If that would cause them to be too soft by themselves, look for another way of increasing the overall volume output, such as an OS setting or turning the knob on your speakers.

Adam Rosenfield
+2  A: 

I think that, so long as the streams are uncorrelated, you shouldn't have too much to worry about, you should be able to get by with clipping. If you're really concerned about distortion at the clip points, a soft limiter would probably work OK.

Tony Arkles
+11  A: 

You should add them together, but clip the result to the allowable range to prevent over/underflow.

In the event of the clipping occuring, you will introduce distortion into the audio, but that's unavoidable. You can use your clipping code to "detect" this condition and report it to the user/operator (equivalent of red 'clip' light on a mixer...)

You could implement a more "proper" compressor/limiter, but without knowing your exact application, it's hard to say if it would be worth it.

If you're doing lots of audio processing, you might want to represent your audio levels as floating-point values, and only go back to the 16-bit space at the end of the process. High-end digital audio systems often work this way.

Roddy
+2  A: 

You're right about adding them together. You could always scan the sum of the two files for peak points, and scale the entire file down if they hit some kind of threshold (or if the average of it and its surrounding spots hit a threshold)

Jon Smock
+6  A: 

"Quieter by half" isn't quite correct. Because of the ear's logarithmic response, dividing the samples in half will make it 6-db quieter - certainly noticeable, but not disastrous.

You might want to compromise by multiplying by 0.75. That will make it 3-db quieter, but will lessen the chance of overflow and also lessen the distortion when it does happen.

Mark Ransom
+3  A: 

Most audio mixing applications will do their mixing with floating point numbers (32 bit is plenty good enough for mixing a small number of streams). Translate the 16 bit samples into floating point numbers with the range -1.0 to 1.0 representing full scale in the 16 bit world. Then sum the samples together - you now have plenty of headroom. Finally, if you end up with any samples whose value goes over full scale, you can either attenuate the whole signal or use hard limiting (clipping values to 1.0).

This will give much better sounding results than adding 16 bit samples together and letting them overflow. Here's a very simple code example showing how you might sum two 16 bit samples together:

short sample1 = ...;
short sample2 = ...;
float samplef1 = sample1 / 32768.0f;
float samplef2 = sample2 / 32768.0f;
float mixed = samplef1 + sample2f;
// reduce the volume a bit:
mixed *= 0.8;
// hard clipping
if (mixed > 1.0f) mixed = 1.0f;
if (mixed < -1.0f) mixed = -1.0f;
short outputSample = (short)(mixed * 32768.0f)
Mark Heath
+5  A: 

There is an article about mixing here. I'd be interested to know what others think about this.

Ben Dyer
+1 That article seems to be better than the selected answer
ajs410
+1  A: 

convert the samples to floating point values ranging from -1.0 to +1.0, then:

out = (s1 + s2) - (s1 * s2);
anthroid
I'm going to have to puzzle that one out, I guess. It seems like it might be appropriate, but if the inputs are 1 and -1, the result is 1. Not sure if I want to break out laplace for this, but if you have any references of more information on why or how this works, I'd appreciate a head start,
Adam Davis
This answer might come from http://www.vttoth.com/digimix.htm .
Gauthier