views:

783

answers:

6

I want to build something similar to Tunatic or Midomi (try them out if you're not sure what they do) and I'm wondering what algorithms I'd have to use; The idea I have about the workings of such applications is something like this:

  1. have a big database with several songs
  2. for each song in 1. reduce quality / bit-rate (to 64kbps for instance) and calculate the sound "hash"
  3. have the sound / excerpt of the music you want to identify
  4. for the song in 3. reduce quality / bit-rate (again to 64kbps) and calculate sound "hash"
  5. if 4. sound hash is in any of the 2. sound hashes return the matched music

I though of reducing the quality / bit-rate due to the environment noises and encoding differences.

Am I in the right track here? Can anyone provide me any specific documentation or examples? Midori seems to even recognize hum's, that's pretty awesomely impressive! How do they do that?

Do sound hashes exist or is it something I just made up? If they do, how can I calculate them? And more importantly, how can I check if child-hash is in father-hash?

How would I go about building a similar system with Python (maybe a built-in module) or PHP?

Some examples (preferably in Python or PHP) will be greatly appreciated. Thanks in advance!

+6  A: 

I worked on the periphery of a cool framework that implements several Music Information Retrieval techniques. I'm hardly an expert (edit: actually i'm nowhere close to an expert, just to clarify), but I can tell that that the Fast Fourier Transform is used all over the place with this stuff. Fourier analysis is wacky but its application is pretty straight-forward. Basically you can get a lot of information about audio when you analyze it in the frequency domain rather than the time domain. This is what Fourier analysis gives you.

That may be a bit off topic from what you want to do. In any case, there are some cool tools in the project to play with, as well as viewing the sourcecode for the core library itself: http://marsyas.sness.net

darren
+1  A: 

Its been a while since i last did signal processing, but rather than downsampling you should look at frequency-domain representations (eg FFT or DCT). Then you could make a hash of sorts and search for the database song with that sequence in.

Tricky part is making this search fast (maybe some papers on gene search might be of interest). I suspect that iTunes also does some detection of instruments to narrow down the search.

Remy
+2  A: 

MFCC extracted from the music is very useful in finding the timbrel similarity between songs.. this is most often used to find similar songs. As pointed by darren, Marsyas is a tool that can be used to extract MFCC and find similar songs by converting the MFCC in to a single vector representation..

Other than MFCC, Rhythm is also used to find song similarity.. There are few papers presented in the Mirex 2009

that will give you good overview of different algorithms and features that are most helpful in detecting music similarity.

StackUnderflow
+1 for the MFCC
keyboardP
+1  A: 

I did read a paper about the method in which a certain music information retrieval service (no names mentioned) does it - by calculating the Short Time Fourier transform over the sample of audio. The algorithm then picks out 'peaks' in the frequency domain i.e. time positions and frequencies that are particularly high amplitude, and uses the time and frequency of these peaks to generate a hash. Turns out the hash has surprising few collisions between different samples, and also stands up against approx 50% data loss of the peak information.....

Tom Woolfrey
+6  A: 

I do research in music information retrieval (MIR). The seminal paper on music fingerprinting is the one by Haitsma and Kalker around 2002-03. Google should get you it.

I read an early (really early; before 2000) white paper about Shazam's method. At that point, they just basically detected spectrotemporal peaks, and then hashed the peaks. I'm sure that procedure has evolved.

Both of these methods address music similarity at the signal level, i.e., it is robust to environment distortions. I don't think it works well for query-by-humming (QBH). However, that is a different (yet related) problem with different (yet related) solutions, so you can find solutions in the literature. (Too many to name here.)

The ISMIR proceedings are freely available online. You can find valuable stuff there: http://www.ismir.net/

I agree with using an existing library like Marsyas. Depends on what you want. Numpy/Scipy is indispensible here, I think. Simple stuff can be written in Python on your own. Heck, if you need stuff like STFT, MFCC, I can email you code.

Steve
I knew I forgot something: similarity at the note level is often accomplished by computing a chromagram. For each time frame, you compute how much C, C#, D, D#, ..., A#, B is in the signal. Therefore, the chromagram has dimension 12-by-time. Many similarity methods use this.
Steve
@Steve: Thanks for your input. Could you email me the code? I'm having a hard time trying to figure out this, some example code would be useful.
Alix Axel
A: 

Currently I'm developing a music search engine using ActionScript 3. The idea is analyzing the chords first and marking the frames (it's limited to mp3 files at the moment) where the frequency changes drastically (melody changes and ignoring noises). After that I do the same thing to the input sound, and match the results with the inverted files. The matching one determines the matching song.

For Axel's method, I think you shouldn't worry about the query whether it's a singing or just humming, since you don't implement a speech recognition program. But I'm curious about your method which uses hash functions. Could you explain that to me?

gearball