tags:

views:

944

answers:

5

I have a bunch of different audio recordings in WAV format (all different instruments and pitches), and I want to "normalize" them so that they all sound approximately the same volume when played.

I've tried measuring the average sample magnitude (the sum of all absolute values divided by the number of samples), but normalizing by this measurement doesn't work very well. I think this method isn't working because it doesn't take into account the frequency of the sounds, and I know that higher-frequency recordings sound louder than lower-frequency sounds of the same amplitude.

Does anyone know a good method for measuring the loudness of a sound?

+9  A: 

Root Mean Square is often used to estimate the loudness of sound files. This is because a sound that is very loud might not be perceived that way if it is very short. Also remember that power increases exponentially with the square of amplitude.

The audio geeks at Hydrogen Audio know a ton about this stuff...check out their free Replay Gain software. You may not need to do any programming at all.

EDIT: Included comment feedback on power vs. amplitude.

PeterAllenWebb
Worked like a charm, thanks. My undergraduate degree was in Physics, so I'm kind of embarassed that I didn't remember this. I had done something really stupid before like multiplying n samples and taking the nth root, thinking that's what root mean square was. Thanks for saving me from myself.
MusiGenesis
You might want to pay attention to the fact that not all frequencies are percieved the same by the listener. A certain RMS level of very low frequencies might be perceived as sounding much louder than the same RMS level of high frequencies.
sthg
Loudness perception is indeed frequency dependent, and follows the equal loudness contours (http://en.wikipedia.org/wiki/Loudness).
Emile Vrijdags
Correction: Power increases *as the square* of amplitude, not exponentially (i.e. P=kA^2). Otherwise, RMS in indeed the right way to measure average loudness.
Noldorin
+3  A: 

Well not being an expert on audio and adding to the previous comment, you should figure out what you define as the "shortest amount of time for peak power" and then just convert the wave to raw floating point and use RMS over the stretch of time and continuously take chunks of that length of time, find the MAX and there you have your highest peak power.

A: 

I might be way off here, but, if you have wavepad you can load in multiple files and mess with the volumes a little bit so they are all the same. Also, if you have certain sections of a file that are louder, you can select that section and lower the volume for that one section.

EDIT: And sorry, it;s not really a "method" for measuring volume, but if you just need to make them all the same this should work fine.

+2  A: 

To add to PeterAllenWebb's response:

Before you calculate the RMS, you should "center" your sample first (think of a 5-minute .wav where each sample has the maximum +amplitude). The best way to do that is to use a highpass filter at a subsonic frequency.

That would still not take the frequencies that humans are sensitive to in count. To do that, you could use A-weighting. There's a page where you can calculate it online: http://www.diracdelta.co.uk/science/source/a/w/aweighting/source.html

The code seems to be here: http://www.diracdelta.co.uk/science/source/a/w/aweighting/multicalc.js

Wouter van Nifterick
I'm finding that normalizing by RMS works a lot better than normalizing by peak value in terms of getting sounds at the same pitch to be roughly equal in volume, but the RMS measurement seems relatively insensitive to pitch, so it's not doing what I want (which is to lower the volume for high-pitched sounds). Webb's wikipedia link showed the frequency response curve for human hearing, but thank you especially for the link to the formula - it's going into code tonight.
MusiGenesis
+1  A: 

To reiterate what some other people have said, use RMS value to estimate the "loudness" of a passage of sound.

But, if you're dealing with impulsive sounds like plucking or drum hits, you'd want to do a sliding RMS value and pick out only the peak RMS value. Measure 100 ms of the sound, slide the window, measure again, etc. and then normalize according to the largest value you find.

Definitely remove any DC value before doing the RMS, and A-weighting will make it more like how we hear. Here's code for A-weighting in MATLAB/Octave and Python.

endolith