ansaurus

Question

Calculate checksum of audio files without considering the header

Answer 1

A:

Bene, If I were you, (And I am in the process of working on something very similar to what you want to do), I would hash the mp3 data block. (Extract it to raw data first, and write it out to disk, so you know what you are dealing with). Then, modify the ID3 tag. Hash your data again. Now, if it changes, compare your two sets of raw data and find out WHERE it changed. Chances are, you might be over-stepping a boundary somewhere. If I recall, MP3 files start with something like FF F8. Well, at least the frame does.

I'm interested in your findings, as I'm still writing all my code to deal with the finger prints, etc, and haven't gotten to the actual hashing yet.

LarryF 2008-12-12 18:37:42

Answer 2

A:

I'm trying to do the same thing. I used MD5 instead of SHA1. I started to export audio checksums using mp3tag (www.mp3tag.de/en/); then made a Perl script similar to yours to do the same thing. Then I removed all tags from my test file, and the audio checksum remained the same.

This is the script:

use MPEG::Audio::Frame;
use Digest::MD5 qw(md5_hex);
use strict;

my $file = 'E:\Music\MP3\Russensoul\01 - 5nizza , Soldat (Russensoul - Russensoul).mp3';
my $mp3tag_audio_md5 = lc '2EDFBD62995A46A45CEEC08C1F303486';

my $md5 = Digest::MD5->new;

open(FILE, $file) or die "Cannot open $file : $!\n";
binmode FILE;

while(my $frame = MPEG::Audio::Frame->read(\*FILE)){
    $md5->add($frame->asbin);
}

print '$md5->hexdigest  : ', $md5->hexdigest, "\n",
      'mp3tag_audio_md5 : ', $mp3tag_audio_md5,  "\n",
      ;

Is it possible that whatever you use to modify your tags sometimes also modifies mp3 headers?

mivk 2009-05-22 12:26:34

Answer 3

+1 A:

If you are looking for stable hashes for the actual music you might want to look at libOFA. Your current methods will give you different results because the formats can have embedded tags. Also if you want two different files with the same song to return the same hash you need to regard things like bitrate and sample frequencies.

libOFA on the other hand can give you a stable hash that can be used between formats and different encodings. Might be what you want?

Tobias R 2009-07-22 11:45:58

ansaurus

tags:

views:

answers:

Calculate checksum of audio files without considering the header

1) Reading + Hashing all MPEG frames using Perl and MPEG::Audio::Frame

2) Decoding + Hashing all MPEG frames using Python and libmad (pymad)

3) Using mp3cat

related questions