views:

67

answers:

5

I want to store multiple mp3 files and search them by giving some part of song, to detect which song it is.

I am thinking of storing all binary content in mysql and when I want to search for a specific song by content I will take some middle portion of song and actually match it with the binary data in MySQL.

My questions are:

  1. Is this a reasonable way to find songs by their content?
  2. Is it right to store the songs' content in the database or should I use the filesystem?
+3  A: 

This is not going to work. MP3 is a "lossy" format. That means that it constantly alters subtle nuances of the music when encoding, thus producing totally different byte-wise data on almost every encoding for the same song.

Also, even in an uncompressed format like WAV, two identical records at different volumes will produce different byte data. So, it is impossible to compare music by comparing the byte values of the file's contents.

A binary comparison will work only for two exact identical copies of the same MP3 file. It won't even work anymore when you re-encode the same MP3 file with identical settings.

Comparing music is not a trivial matter, several approaches exist but to my knowledge none that can be used in PHP.

If you're lucky, there exists a web service that allows some kind of matching. Expect it to be commercial in some way, though - I doubt we are at the stage where this kind of thing can be used free of charge.

Pekka
Agreed except the "none that can be used in PHP" part. You can always make one yourself :)
Petr Peller
I said *exist* @Petr :)
Pekka
Approach exists does not mean it's already implemented.Ok, nevermind :)
Petr Peller
Thank you all for your reply,actually I do not have any constraints for using only php for it I can use any language for it, but I have seen one software on iPhone which work like if we record sound using that software for 30 seconds it actually gives full information of song so I don't know how they did it I actually I don't remember the name of that software.
PankajK
+2  A: 

Is it a right way to find songs by content of song.

Only if you can be sure that the part you get as search criterium will actually be an excerpt from that particular MP3 file... and that is very, very unlikely. If the part can be from a different source (i.e. a different recording of the same song, or just a differently compressed MP3), you'll have to use audio fingerprinting which is vastly more complicated.

Is it right to store songs content in database or file store normally will work?

If you do simple binary matching, there is no point in using a database. If you have a more complex indexing technique (such as audio fingerprints) then using a database can make sense.

Michael Borgwardt
A: 

I don't fully understand what you're trying to do, but if you're going to index an MP3 collection, it's probably a better idea to store a hash (of sufficient length) rather than the actual file.

The problem is that the bytes don't give you any insight to the CONTENT of the file, i.e. the music in it. Even if you cut the metadata from the bytes to compare (to get rid of noise like changes in spelling/capitalisation of metadata), you only know something about the unique file itself. So you could compare two identical files (i.e. exact duplicates) for equality, but you couldn't compare any two random files for similarity.

Alan
+1  A: 

As others have pointed out - comparing MP3s by looking at the binary content of files is not going to work.

I wrote something like this in Java whilst at university for my final year project. I'd be more than happy to send you the source code. It dealt in relative similarities - "song X is more similar to song Y than it is to song Z", rather than matches, but it might be a step in the right direction.

And please, whatever you do, don't try and do this in PHP. The algorithm I used needed me to compute (if I remember correctly - I worked on this around 3 years ago) 30 30x30 matrices for each MP3 it analysed. Each song took around 30 seconds to process to a set of matrices on my clunky old machine (I'm sure my new PC could get the job done significantly quicker). Once I had those matrices for n songs a second step computed differences between each pair of songs, and a third step reduced those differences down to m-dimensional space. Each of these 3 steps takes a fair amount of horsepower, and PHP definitely isn't the right horse for the job.

What PHP might work for is a frontend - I ended up with a queryable web-app written in Ruby on Rails, where I had a simple backend which stored the co-ordinates of each song in m-dimensional space (I happened to choose m = 6) - given a particular song, or fragment, X, you could then compute songs within a certain "distance" of X.

NB. I should probably point out that all the code I wrote was basically just a wrapper around libraries others had written - which were by some smart people at a university in Austria - those libraries took two songs and generated the matrices - all I did was compute distances and map distances of lots of songs into m-dimensional space. Wish I was smart enough to have done the first bit too!

Dominic Rodger
A: 

To search songs, you may probably want to index their tags and focus on a nice, easy to use UI so users can look for them in flexible ways.

As said above, same song will show different content bytes depending on the encoding.

However, one idea pointing to your direction, and I'm not sure how feasible is, would be to index some songs patterns that may uniquely identify it. For ex. what do all Johnny Cash songs have in common? Volume, tone, a combination of them? And when you get a portion of content, you may extract that same pattern from it and match. That would be an interesting concept.

Ariel