tags:

views:

208

answers:

2

Hi, this is a 2 part question. First off, is it possible to access the audio data in an MP3 independently of the ID3 tags, and secondly, is there any way to do so using available libraries?

I recently consolidated my music collection from 3 computers and ended up with songs which had changed ID3 tags, but the audio data itself was unmodified. Running a search for duplicate files failed because the file changed with the ID3 tag change, but I think it should be possible to identify duplicate files if I just run a deduplication using the audio data for comparison.

I know that it's possible to seek to a particular position past the ID3 header in the file, and directly read the data, but was wondering if there's a library that would expose the audio data so I could just extract the data, run a checksum on it, and store the computed result somewhere, then look for identical checksums. (Also, I'd probably have to use some kind of library when you take into account variable length headers.)

+1  A: 

Coincidentally I wanted to do something similar the other day.

Here is a Ruby script that I whipped up:

http://code.google.com/p/kodebucket/source/browse/trunk/bin/mp3dump.rb

It dumps mpeg frames to stdout, so one could grab a checksum like so:

# mp3dump.rb file.mp3 | md5sum

Hmm. Kind of what I was looking for, though I have no clue what it's doing. I'll accept it, but I wouldn't mind an explanation of what it's doing. I'm presuming the unless sequence is filtering out the ID3 tags somehow, but can't tell how. A link to whatever doc you used to create this would be awesome. :)
Yeah, it's probably a bit obfuscated; stream of consciousness coding...The gist of it:open an mp3 file; read 4 bytes; if the bytes we've read is a valid mp3 header, read the frame and send it to stdout; otherwise we rewind 3 bytes and try again until we reach the end of the file.I used the following MPEG frame resoure: http://www.datavoyage.com/mpgscript/mpeghdr.htm