views:

106

answers:

4

Given two byte arrays of data captured from a microphone, how can I determine which one has more spikes in noise? I would assume there is an algorithm I can apply to the data, but I have no idea where to start.

Getting down to it, I need to be able to determine when a baby is crying vs ambient noise in the room.

If it helps, I am using the Microsoft.Xna.Framework.Audio.Microphone class to capture the sound.

A: 

Louder at what point? The signal's average amplitude will tell you which one is louder on average, but that is kind of a dumb, brute force way to go about it. It may work for you in practice though.

Getting down to it, I need to be able to determine when a baby is crying vs ambient noise in the room.

Ok, so, I'm just throwing out ideas here; I am by no means an expert on audio processing.

If you know your input, i.e., a baby crying (relatively loud with a high pitch) versus ambient noise (relatively quiet), you should be able to analyze the signal in terms of pitch (frequency) and amplitude (loudness). Of course, if during he recording someone drops some pots and pans onto the kitchen floor, that will be tough to discern.

As a first pass I would simply traverse the signal, maintaining a standard deviation of pitch and amplitude throughout, and then set a flag when those deviations jump beyond some threshold that you will have to define. When they come back down you may be able to safely assume that you captured the baby's cry.

Again, just throwing you an idea here. You will have to see how it works in practice with actual data.

Ed Swangren
updated to be a little more specific to what i need.
Joe
Ok, updated my post with an idea.
Ed Swangren
yeah, i was thinking something like that would work, but all I have is the raw data. any ideas on how to get it into a state where I can evaluate it?
Joe
Your raw data would necessarily include all of the information needed to play the audio, i.e., frequency, amplitude, etc. What format is the data in?
Ed Swangren
I have the raw byte data. I need to find some class to feed it into that will be able to extract the freq/amp from it, right?
Joe
I would go with the solution provided by CodeInChaos. He obviously has more experience with this sort of thing than I do.
Ed Swangren
If someone drops pots and pans onto the kitchen floor the baby will be awake anyway.
Albin Sunnanbo
A: 

I agree with @Ed Swangren, it will take a lot of playing with samples of data for a lot of sources. To me, it sounds like the trick will be to limit or hopefully eliminate false positives. My experience with babies is they are much louder crying than the environment. so, keeping track of the average measurements (freq/amp/??) of the normal environment and then classifying how well the changes match the characteristics of a crying baby which changes from kid to kid, so you'll probably want a system that 'learns'. Best of luck.

update: you might find this library useful http://naudio.codeplex.com/

kenny
+2  A: 

First use a Fast Fourier Transform to transform the signal into the frequency domain. Then check if the signal in the typical "cry-frequencies" is significantly higher than the other amplitudes.

The preprocessor of the speex codec supports noise vs signal detection, but I don't know if you can get it to work with XNA.

Or if you really want some kind of loudness calculate the sum of squares of the amplitudes from the frequencies you're interested in (for example 50-20000Hz) and if the average of that over the last 30 seconds is significantly higher than the average over the last 10 minutes or exceeds a certain absolute threshold sound the alarm.

CodeInChaos
Finding "cry-frequencies" will not catch "baby banging toys to the wall"-sounds or other loud sounds that indicates that the baby is awake, but not crying.
Albin Sunnanbo
Added a bit about loud sounds.
CodeInChaos
+1  A: 

you can convert each sample (normalised to a range 1.0 to -1.0) into a decibel rating by applying the formula

dB = 20 * log-base-10 (sample-value)

To be honest, so long as you don't mind the occasional false positive, and your microphone is set up OK, you should have no problem telling the difference between a baby crying and ambient background noise, without going through the hassle of doing an FFT.

I'd recommend you having a look at the source code for a noise gate, which does pretty much what you are after, with configurable attack times & thresholds.

Mark Heath
this should be fine as an occasional false positive would be fine. Given that I only have the raw data (a byte array), how can I achieve what you are talking about?
Joe
well you do need to know how many bits per sample. It seems likely that it will be 16 (i.e. two bytes per sample). Also I'm guessing it is a mono recording. So every two bytes, turn it into a short (use BitConverter) and that is the amplitude of that sample
Mark Heath