Basically, I have a lot of audio files representing the same song. However, some of them are worse quality than the original, and some are edited to where they do not match the original song anymore. What I'd like to do is programmatically compare these audio files to the original and see which ones match up with that song, regardless of quality. A direct comparison would obviously not work because the quality of the files varies.
So let's say the song in question is "Viva la Vida" by Coldplay. I have the original, high-quality song, and I have a bunch of copies of it. Some of the copies are full-quality, some are lower. Also, I might have one or two versions that were edited to cut out the end of the song. What I want to do is match the the ones that were not edited, regardless of the quality. Is there a library that can do this?
I believe this could be done by analyzing the structure of the songs and comparing to the original, but I know nothing about audio engineering so that doesn't help me much. All the songs are of the same format (MP3). Also, I'm using Python, so if there are bindings for it, that would be fantastic; if not, something for the JVM or even a native library would be fine as well, as long as it runs on Linux and I can figure out how to use it (read: documentation).