If your wav file has two channels, then the length of sound_info
would be 2*sample rate*duration (seconds). The channel data alternate, so if you have slurpped all the values into a 1-dimensional array, data
, then the values associated with one channel would be data[::2]
, and the other would be data[1::2]
.
Roughly speaking, smooth functions can be represented as sums of sine and cosine waves (with various amplitudes and frequencies).
The FFT (Fast Fourier Transform) relates the function to the coefficients (amplitudes) of those sine and cosine waves. That is, there is a one-to-one mapping between the function on the one hand and the sequence of coefficients on the other.
If a sound sample consists mainly of one note, its FFT will have one coefficient which is very big (in absolute value), and the others will be very small. That coefficient corresponds to a particular sine wave, with a particular frequency. That's the frequency of the note.