+1  A: 

If your wav file has two channels, then the length of sound_info would be 2*sample rate*duration (seconds). The channel data alternate, so if you have slurpped all the values into a 1-dimensional array, data, then the values associated with one channel would be data[::2], and the other would be data[1::2].


Roughly speaking, smooth functions can be represented as sums of sine and cosine waves (with various amplitudes and frequencies).

The FFT (Fast Fourier Transform) relates the function to the coefficients (amplitudes) of those sine and cosine waves. That is, there is a one-to-one mapping between the function on the one hand and the sequence of coefficients on the other.

If a sound sample consists mainly of one note, its FFT will have one coefficient which is very big (in absolute value), and the others will be very small. That coefficient corresponds to a particular sine wave, with a particular frequency. That's the frequency of the note.

unutbu
@unutbu Thanks for your awesome reply! :) Would you know why sound_info is sequencial though?
RadiantHex
+1: Musical sounds have overtones. A lot of them: they're integer multiples of the fundamental frequency. Further, real instruments include a great deal of noise as well as time-shifted signals (i.e., doppler shifts) that make recognizing the fundamental challenging.
S.Lott
@S.Lott thanks for that. Is there not a way of getting a list of frequencies for each sample? Or is each sample limited to only one frequency value? :|
RadiantHex
@RadiantHex: What do you think the FFT gives you? It transforms time-domain samples into frequency domain. Please read up on FFT more carefully.
S.Lott
@S.Lott: So I would have to split the samples into time groups in order to obtain the energy value of each frequency changing over time?
RadiantHex
@RadiantHex: you might want to check out my answer to http://stackoverflow.com/questions/2648151/python-frequency-detection/2649540#2649540. It might help with the frequency detection anyway. Also, if you are really looking to get the frequencies at specific times, then you should look into the short-time Fourier transform.
Justin Peel
@RadiantHex: Yes. You transform a time-domain sample into frequency domain data. Too big a time domain and you have multiple pitches. Too small a time domain and you may not have a complete fundamental. Also, random time slices are useless; you have to find a "beat" if you want to find "music" (i.e., melody). Please read up on FFT more carefully.
S.Lott
@S.Lott: thanks for sharing that, any idea what a good place to read up on music theory is? =)
RadiantHex
@RadiantHex: Is google broken? Did you read http://stackoverflow.com/questions/2648151/python-frequency-detection/2649540#2649540?
S.Lott