views:

745

answers:

4

The typical FFT for audio looks pretty similar to this, with most of the action happening on the far left side

http://www.flight404.com/blog/images/fft.jpg

He multiplied it by a partial sine wave to get it to the bottom, but the article isn't too specific on this part of it. It also seems like a "good enough" modification of the dataset, rather than one based on some property. I understand that human hearing is better suited to the higher frequencies, thus, most music will have amplified bass and attenuated treble so that both sound to us as being of relatively equal strength.

My question is what modification needs to be done to the FFT to compensate for this standard falloff?

for(i = 0; i < fft.length; i++){
     fft[i] = fft[i] * Math.log(i + 1); // does, eh, ok but the high
                                        // end is still not really "loud"
                                        // enough
}

EDIT ::

http://en.wikipedia.org/wiki/Equal-loudness_contour

I came across this article, I think it might be the direction to head in, but there still might be some property of an FFT that needs to be counteracte.

+1  A: 

So you are trying to raise the level of the high end frequencies? Sounds like a high pass filter with a minimum multiplier might work, so that you don't attenuate the low frequency signals too much. Pick up a good book on filter design, maybe monkey around with this applet

Rob Elsner
That might be what ends up happening - but there should be some principal at work here.
el3ment
A: 

In the old days of first samplers, this is before MOTU Boost people :) it wasn't FFT but simple (Fairlight or Roland it first I think) Normalisation done on the original or resulting time-domain signal (if you are doing beat slicing, recycle-style); can't you do that? Or only go for the FFT after you compensate to counteract for it?

Seems like a two phase procedure otherwise, I'd personally leave FFT as is for the task..

rama-jka toti
I'm not sure exactly what you mean - but I am trying to compensate for the difference in beat strengths, Bass beats are two, to three times larger than the high-hat beats. I could have a different threshold value for each sub band of the FFT -- but if there is an underlying principal as to why the highs are less powerful than the lows, then I could apply it to the FFT and the resulting beat magnitudes would be corrected.
el3ment
Hmm.. not sure myself what I'd do and what's attempted as a result and it don't sound easy. I believe Audacity had a go at it and halted with some similar issues; sources and wiki are there so it might help. I only see you doing this in number of normalisation as well as FFT passes, per slice or set of them, which is what Recycle always seemed to do taking quite a while. Another but expensive/time-consuming route would be to 'reverse' what other packages do.. plenty of new bits around though, I am quite behind but NI guys were at the top of the game last time I checked.
rama-jka toti
Or giving this a shot to see whether it has it done : http://freecycle.redsteamrecords.com/features/
rama-jka toti
+2  A: 

I think the equal loudness contour is exactly the right direction. However, its shape depends on the absolute pressure level. In other words the sensitivity curve of our hearing changes with sound pressure.

There is no "correct normalization" if you have no information about absolute levels. If this is a problem depends on what you want to do with the data.

The loudness contour is standardized in ISO 226 but this document is not freely available for download. It should be in a decent university library though. Here is another source for loudness contours

Ludwig Weinzierl
I am only transforming audio files, is there anything we can assume about the absolute levels knowing this?
el3ment
I'm afraid, no. The sensitivity of the microphone, the amplification level, the scaling in the recording software, etc. everything is a factor that is multiplied in the recording process. I'd say it depends on the problem you are trying to solve if you really need to consider this effect. For a start you could use the blue curve from the wikipedia article on equal loudness contour.
Ludwig Weinzierl
+1  A: 

First, are you sure you want to do this? It makes sense to compensate for some things, like the microphone response not being flat, but not human perception. People are used to hearing sounds with the spectral content that the sounds have in the real world, not along perceptual equal loudness curves. If you play a sound that you've modified in the way you suggest it would sound strange. Maybe some people like the music to have enhanced low frequencies, but this is a matter of taste, not psycophysics.

Or maybe you are compensating for some other reason, for example, taking into account the poorer sensitivity to lower frequencies might enhance a compression algorithm. Is this the idea?

If you do want to normalize by the equal loudness curves, one should note that most of the curves and equations are in terms of sound pressure level (SPL). SPL is the log of the square of the waveform amplitude, so when you work with the FFTs, it's probably easiest to work with their square (the power specta). (Or, of course, you could compensate in other ways by, say, multiplying by sqrt(log(i+1)) in your equation above -- assuming that the log was an approximation of the inverse equal-loudness curve.)

tom10
Indeed, I'm only analyzing the FFT for beat information - so the compensation is only to make discovery easier and has nothing to do with the audio that gets played (which comes from the original file).So, if we had a function called equalLoudnessCurve(frequency), which returned the value say, 80 for frequency 10hz we could modify the code above to be fft[i] = fft[i] * sqrt(equalLoudnessCurve(frequencyFromIndex(i)); which would attenuate the bass (but not silence), and amplify the treble?
el3ment
Yes, I think this equation is correct, although the equalLoudnessCurve would need to be in real numbers, not in dB (most the graphs use dB). Still though, now that you've clarified your problem more, I don't think this equal loudness thing is the right approach. For example, consider a periodic violin note establishing the beat. 1) This isn't going to have a peak in the power spectrum at the beat frequency. 2) It's too complicated for what you want to get out of it; if you want to enhance low freq, just use any function that looks reasonable, there's nothing special about equal loudness.
tom10