+1  A: 

Maybe you should try to detect significant instant rise in air pressure that should mark a door close. You can pair it with this waveform and sound level analysis and these all might give you a better result.

Szundi
+1  A: 

I would imagine that the frequency and amplitude would also vary significantly from vehicle to vehicle. Best way to determine that would be taking a sample in a Civic versus a big SUV. Perhaps you could have the user close the door in a "learning" mode to get the amplitude and frequency signature. Then you could use that to compare when in usage mode.

You could also consider using Fourier analysis to eliminate background noises that aren't associated with the door close.

Turnkey
+14  A: 

Looking at the screenshots of the source audio files, one simple way to detect a change in sound level would be to do a numerical integration of the samples to find out the "energy" of the wave at a specific time.

A rough algorithm would be:

  1. Divide the samples up into sections
  2. Calculate the energy of each section
  3. Take the ratio of the energies between the previous window and the current window
  4. If the ratio exceeds some threshold, determine that there was a sudden loud noise.

Pseudocode

samples = load_audio_samples()     // Array containing audio samples
WINDOW_SIZE = 1000                 // Sample window of 1000 samples (example)

for (i = 0; i < samples.length; i += WINDOW_SIZE):
    // Perform a numerical integration of the current window using simple
    // addition of current sample to a sum.
    for (j = 0; j < WINDOW_SIZE; j++):
        energy += samples[i+j]

    // Take ratio of energies of last window and current window, and see
    // if there is a big difference in the energies. If so, there is a
    // sudden loud noise.
    if (energy / last_energy > THRESHOLD):
        sudden_sound_detected()

    last_energy = energy
    energy = 0;

I should add a disclaimer that I haven't tried this.

This way should be possible to be performed without having the samples all recorded first. As long as there is buffer of some length (WINDOW_SIZE in the example), a numerical integration can be performed to calculate the energy of the section of sound. This does mean however, that there will be a delay in the processing, dependent on the length of the WINDOW_SIZE. Determining a good length for a section of sound is another concern.

How to Split into Sections

In the first audio file, it appears that the duration of the sound of the door closing is 0.25 seconds, so the window used for numerical integration should probably be at most half of that, or even more like a tenth, so the difference between the silence and sudden sound can be noticed, even if the window is overlapping between the silent section and the noise section.

For example, if the integration window was 0.5 seconds, and the first window was covering the 0.25 seconds of silence and 0.25 seconds of door closing, and the second window was covering 0.25 seconds of door closing and 0.25 seconds of silence, it may appear that the two sections of sound has the same level of noise, therefore, not triggering the sound detection. I imagine having a short window would alleviate this problem somewhat.

However, having a window that is too short will mean that the rise in the sound may not fully fit into one window, and it may apppear that there is little difference in energy between the adjacent sections, which can cause the sound to be missed.

I believe the WINDOW_SIZE and THRESHOLD are both going to have to be determined empirically for the sound which is going to be detected.

For the sake of determining how many samples that this algorithm will need to keep in memory, let's say, the WINDOW_SIZE is 1/10 of the sound of the door closing, which is about 0.025 second. At a sampling rate of 4 kHz, that is 100 samples. That seems to be not too much of a memory requirement. Using 16-bit samples that's 200 bytes.

Advantages / Disadvantages

The advantage of this method is that processing can be performed with simple integer arithmetic if the source audio is fed in as integers. The catch is, as mentioned already, that real-time processing will have a delay, depending on the size of the section that is integrated.

There are a couple of problems that I can think of to this approach:

  1. If the background noise is too loud, the difference in energy between the background noise and the door closing will not be easily distinguished, and it may not be able to detect the door closing.
  2. Any abrupt noise, such as a clap, could be regarded as the door is closing.

Perhaps, combining the suggestions in the other answers, such as trying to analyze the frequency signature of the door closing using Fourier analysis, which would require more processing but would make it less prone to error.

It's probably going to take some experimentation before finding a way to solve this problem.

coobird
I really like this approach, so I'll implement it and report my results back here. Rather than having a fixed threshold I'm going to keep track of the average energy, and have an adjustable factor (ie, window must be 1.5+ times average to trigger).
Adam Davis
+5  A: 

You should tap in to the door close switches in the car. Trying to do this with sound analysis is overengineering.

There are a lot of suggestions about different signal processing approaches to take, but really, by the time you learn about detection theory, build an embedded signal processing board, learn the processing architecture for the chip you chose, attempt an algorithm, debug it, and then tune it for the car you want to use it on (and then re-tune and re-debug it for every other car), you will be wishing you just stickey taped a reed switch inside the car and hotglued a magnet to the door.

Not that it's not an interesting problem to solve for the dsp experts, but from the way you're asking this question, it's clear that sound processing isn't the route you want to take. It will just be such a nightmare to make it work right.

Also, the clapper is just an high pass filter fed into a threshold detector. (plus a timer to make sure 2 claps quickly enough together)

James Caccese
In many situations this is the correct answer. If it can be done without wiring the switches then we can avoid installation issues, cost increase, liability issues, and given that this device is meant to be re-used (not permanently installed) wiring is not optimal. It's an option, though.
Adam Davis
In some cases, such as electric sliding or hatchback doors wiring might be necessary as the sound of closing is not as energetic, so more processing is required, or custom tuning (and thus setup). Another thought was to place a light sensor near a courtesy light - easy to install, and works too.
Adam Davis
I tend to agree, but that didn't stop me from suggesting way to do it algorithmically.... :) +1
Drew Hall
"from the way you're asking this question, it's clear that sound processing isn't the route you want to take." What about the way I'm asking the question suggests that? I'll gladly modify it if I'm giving a false impression of my intent.
Adam Davis
Sorry if I came off like an ass, but you seem new to signal processing. I don't want to turn you off if you're looking to play or learn, but if you want to make a real solution to door closing detection in a reasonable amount of time, audio signal processing is probably the most difficult route.
James Caccese
Hey, no problem - it's good to make sure that the correct path is being taken. You're right that I'm fairly new - I took a few signal processing classes a few years ago, and I understand the principles. But one has to start somewhere, and while your solution is good, it's not usable for this case.
Adam Davis
If you're worried about wires, use a magnetic sensor that broadcasts a chirp of RF or something from the movement of the door. http://216.71.30.251/faq.html#__4._How_does_the_Lightning_SwitchTM_work_with_no_batteries_
endolith
+3  A: 
ccook
+5  A: 

There is a lot of relevant literature on this problem in the radar world (it's called detection theory).

You might have a look at "cell averaging CFAR" (constant false alarm rate) detection. Wikipedia has a little bit here. Your idea is very similar to this, and it should work! :)

Good luck!

Drew Hall
+1 sounds(heh) like a good application
ccook
+3  A: 

The process for finding distinct spike in audio signals is called transient detection. Applications like Sony's Acid and Ableton Live use transient detection to find the beats in music for doing beat matching.

The distinct spike you see in the waveform above is called a transient, and there are several good algorithms for detecting it. The paper Transient detection and classification in energy matters describes 3 methods for doing this.

Nick Haddad
A: 

On the issue of less frequent sampling, the highest sound frequency which can be captured is half of the sampling rate. Thus, if the car door sound was strongest at 1000Hz (for example) then a sampling rate below 2000Hz would lose that sound entirely

barrowc
Also called the Nyquist rate - the rate at which sampling must be performed to avoid aliasing in a given continuous to discrete conversion http://en.wikipedia.org/wiki/Nyquist_rate .
Adam Davis
A: 

A very simple noise gate would probably do just fine in your situation. Simply wait for the first sample whose amplitude is above a specified threshold value (to avoid triggering with background noise). You would only need to get more complicated than this if you need to distinguish between different types of noise (e.g. a door closing versus a hand clap).

Mark Heath
Yes, I've shown the very simple cases here. The reality is much more harsh - with the radio and heating/AC on high then I get reasonable results, but I also get a few false positives with just RMS peak detection.
Adam Davis