Question about timing in C# real time audio analysis

I'm not sure which WaveIn.cs class you're using, but usually with code that records audio, you either A) tell the code to start recording, and then at some later point you tell the code to stop, and you get back an array (usually of type short[]) that comprises the data recorded during this time period; or B) tell the code to start recording with a given buffer size, and as each buffer is filled, the code makes a callback to a method you've defined with a reference to the filled buffer, and this process continues until you tell it to stop recording.

Let's assume that your recording format is 16 bits (aka 2 bytes) per sample, 44100 samples per second, and mono (1 channel). In the case of (A), let's say you start recording and then stop recording exactly 10 seconds later. You will end up with a short[] array that is 441,000 (44,100 x 10) elements in length. I don't know what algorithm you're using to detect "taps", but let's say that you detect taps in this array at element 0, element 22,050, element 44,100, element 66,150 etc. This means you're finding taps every .5 seconds (because 22,050 is half of 44,100 samples per second), which means you have 2 taps per second and thus 120 BPM.

In the case of (B) let's say you start recording with a fixed buffer size of 44,100 samples (aka 1 second). As each buffer comes in, you find taps at element 0 and at element 22,050. By the same logic as above, you'll calculate 120 BPM.

Hope this helps. With beat detection in general, it's best to record for a relatively long time and count the beats through a large array of data. Trying to estimate the "instantaneous" tempo is more difficult and prone to error, just like estimating the pitch of a recording is more difficult to do in realtime than with a recording of a full note.

So if I do mono, each of those numbers in my array represents one sample for one channel? If I were to do 2 channels would my array then by 88200 in size? Alternating between channels?

zac 2009-08-12 14:06:13

Yes, stereo means you have twice as many samples per second, and the samples are interleaved (left, right, left, right etc.), so elements 0, 2, 4, 6 etc. represent data for the left channel, and elements 1, 3, 5, 7 etc. represent data for the right channel.

MusiGenesis 2009-08-12 14:38:24

Another question: when I do an fft on this to get amplitude, it returns half the size of my original array. From what I have read this is because it converts it to a real and imaginary part and uses both of those to get amplitude. What do each of the values in this array now account for? 2 samples?

zac 2009-08-12 18:43:43

It depends on what code you're using to do the FFT. For audio DSP, an FFT function usually takes two arrays, 1 for the real and 1 for the imaginary part. Before the FFT, the real array contains the recorded sample values, and the imaginary array is all zeroes. After the transform, the two arrays are the same size as before, but will both now contain different, non-zero values. Whatever code you're using is probably combining these two transformed arrays into a single, half-size array that contains the frequency components. CONTINUED...

MusiGenesis 2009-08-12 19:54:49

The values in this array now contain what are usually called frequency "bins", where the number in each bin represents the magnitude of the frequency component in that range. If, for example, your original audio was at 44100 Hz (normal CD audio) and your magnitude array is, say, 1000 elements in size, then each bin represents the magnitude of a 44.1 Hz slice of your original audio. So the value in element[0] represents the frequency response from 0 to 44.1 Hz, the value in element[1] represents the magnitude from 44.1 to 88.2Hz, element[2] is 88.2 to 132.3 Hz and so on.

MusiGenesis 2009-08-12 19:59:01

Incidentally, this is why FFT is not a good approach for pitch detection. To make the "bins" narrow enough to read a pitch accurately, your FFT window has to be enormous, which means it will be very slow.

MusiGenesis 2009-08-12 20:01:47

Go ahead and at least vote my answer up, you ungrateful bastard. :)

MusiGenesis 2009-08-12 20:03:53

Wow, your a genius with this sound analysis stuff. Its making sense to me know. So what am I looking at when I'm analyzing the raw array before doing an FFT on it? Will it still give me an amplitude to detect my tap with? Because I would need it I guess then to get the time between the taps.

zac 2009-08-12 20:33:04

For detecting each tap in your recorded audio, you actually don't need to do an FFT at all (I wasn't quite sure why you mentioned it). Here is my answer to an earlier question about note onset detection (which is very similar to what you're trying to do): http://stackoverflow.com/questions/294468/note-onset-detection/294724#294724

MusiGenesis 2009-08-12 20:49:40

Also, the term "genius" is a measure of accomplishment, not of aptitude or knowledge, and I haven't accomplished anything of significance. Thanks, though. :)

MusiGenesis 2009-08-12 20:52:00

ansaurus

tags:

views:

answers:

Question about timing in C# real time audio analysis

related questions