Is it possible to compare two sounds ? for example app have already a sound file mp3 or any format, is it possible to compare any static sound file and recorded sound inside of app ?
Any comments are welcomed.
Regards
Is it possible to compare two sounds ? for example app have already a sound file mp3 or any format, is it possible to compare any static sound file and recorded sound inside of app ?
Any comments are welcomed.
Regards
This forum thread has a good answer (about three down) - http://www.dsprelated.com/showmessage/103820/1.php.
The trick is to get the decoded audio from the mp3 - if they're just short 'hello' sounds, I'd store them inside the app as a wav instead of decoding them (though I've never used CoreAudio or any of the other frameworks before so mp3 decoding into memory might be easy).
When you've got your reference wav and your recorded wav, follow the steps in the post above :
1 Do whatever is necessary to convert .wav files to their discrete- time signals:
http://www.sonicspot.com/guide/wavefiles.html
2 time-warping might or might not be necessary depending on difference between two sample rates:
http://en.wikipedia.org/wiki/Dynamic_time_warping
3 After time warping, truncate both signals so that their durations are equivalent.
4 Compute normalized energy spectral density (ESD) from DFT's two signals:
http://en.wikipedia.org/wiki/Power_spectrum.
6 Compute mean-square-error (MSE) between normalized ESD's of two signals:
http://en.wikipedia.org/wiki/Mean_squared_error
The MSE between the normalized ESD's of two signals is good metric of closeness. If you have say, 10 .wav files, and 2 of them are nearly the same, but the others are not, the two that are close should have a relatively low MSE. Two perfectly identical signals will obviously have MSE of zero. Ideally, two "equivalent" signals with different time scales, (20-second human talking versus 5-second chipmunk), different energies (soft-spoken human verus yelling chipmunk), and different phases (sampling began at slightly different instant against continuous time input); should still have MSE of zero, but quantization errors inherent in DSP will yield MSE slightly greater than zero.
You should get two different MSE values, one between your male->recorded track and one between your female->recorded track. The comparison with the lowest difference is probably the correct gender.
I confess that I've never tried to do this and it looks very hard - good luck!