tags:

views:

386

answers:

4

Is it possible with FFT to find an occurrence of a small wav sample inside of a longer wav, if it is known that that exact sample exists somewhere in the wav (but may be mixed with other sounds)?

edit

(after receiving two responses): What if I have a library of all known sounds that can be in the larger WAV and wish to find occurrences of each of them within that WAV? In other words, I know every possible sound that can be mixed into the big wav, and wish to find occurrences of them?

A: 

Not precisely as you have defined it, if it is mixed with other sounds, and here's the reason; consider the effect of a wave mixed precisely with its inverse; the result is flat response. The mixing of waves can have a monotonic function, that is, to effectively mask one wave with another in a way that the first is unretrievable.

That said, there is likely a way of characterizing the "signature" of a wave such that it is likely to be present in a resultant composite wave file, but that signature would depend on the length of the wave file and to some extent what type of combinations were expected to be done upon it.

Your question probably has something to do with determining if samples of one work exist within another, composite, work. In general, yes, FFTs are useful for determining a "signature" for a given wave, and being able to extract that "signature" from another wave; they're good for some things (such as frequency shift; it just shows up as a displacement on the FFT), but not so great for other things (varying frequency modulation, for one; high (or uneven) bandwidth compression of the original signal). To put it another way: FFTs are a good way to detect "naive" use of samples, but a determined resampler can modify the original sample to make it hard to detect via FFT if he knows that that is the detection technique used.

McWafflestix
A: 

If you know the exact nature of the sample (length in bits etc.) then it is very possible. If it alters in any way then you are going to have a lot of work to do first.

Because of the way WAV files are encoded (sequentially by track - so you get the first lot of bits for the first track, then the first lot of bits for the second track, then the second lot of bits from the first track)

This can obviously repeat for as many tracks. If you know the WAV file you are looking for is encoded specifically in one of these tracks then you can isolate each track and perform operations on them.

Obviously if your sample differs by speed, tempo, pitch etc. then it's going to have a different bit signature so you will have to normalise the tracks.

Jamie Lewis
+2  A: 

I assume by exact you don't mean sample value exact. If it were sample-value exact, then it would be a simple matter of searching for the sample values, which is fast and efficient.

If you are looking for bits of sound that contribute, the best approach is to use a mathematical process called "convolution". Basically, take the sample that you are trying to find within the big sample, effectively place it next to the big sample, and correlate. Do this for every sample position. You will from this get a curve that will have distinct spikes in it where the sample is. Its quite computationally intensive, but computers have gotten quite fast, so its feasible.

But - this is assuming that the sample came from the same recording for both cases. Miking a drum sound, even the same drum sound, from two different locations, will not produce very good correlation.

Hope that helps.

Matthias Wandel
That helps a lot. My goal is to transcribe old WAV recordings of a digital piano that I've made into MIDI. That digital piano has a finite set of samples that it plays. If I could grab every possible sample that it can play and use convolution with the old recordings, this sounds like I might be able to do it. One obstacle I can think of is that I've recorded at different volumes. I wonder how much that would affect me? I actually don't care if its absolutely perfect, so long as it is better than most "sound to midi" apps out there which are TERRIBLE. Thanks.
ZomCoder
Weaker volume will result in weaker correlation.A tuned sound like a piano will however produce many spikes, as the note correlates to itself, plus offset by one wavelength. So you will have to only use the peak that is a local maximum among the other peaks.You will have to correlate with every note that you may have played, so it might be a bit slow.
Matthias Wandel
For this task, you would use the cross-correlation and not the convolution. Very similar, but a little different.
tom10
Cross-correlation, not convolution.
endolith
+1  A: 

It depends on exactly what you're trying to find and what you're trying to find it in.

  • If you're looking for a sample that's exactly the same as a chunk of a larger WAV file, bit-for-bit, then you can search for the values directly.
  • If it's exactly the same sound, but not sample-accurate (matching a clip of an MP3 to a WAV of the same song, for instance), you can easily find it using cross-correlation. Cross-correlation can be sped up significantly by using an FFT method instead of a "naive" method that explicitly multiplies and sums the samples.
  • If you're looking for a short sample that's been mixed with other sounds, it might still be possible to use cross-correlation, but it depends if the other sounds affect the match. For digital piano with simple samples and no effects, straight into a digital recorder, this might work.
  • If the sound has been through any type of filtering, polarity reversal, or phase shift, however, this will not work very well, since the wave shapes will be changed. So if the piano was played through speakers and then recorded with microphones, this isn't a viable solution.

What might work better in this case is to create a spectrogram of the recording using the short-time Fourier transform (STFT), and a spectrogram of the thing you're looking for, and then do a time-wise cross-correlation of the two images. The spectrogram is a 2D image of the amplitude of the sounds' spectrums over time, which you can then match. (This is probably a roundabout way of doing something there are more specialized algorithms for, but I don't know what it would be called.) ;)

Can you upload some sound clips somewhere?

endolith