I want to programmatically create a SHA1 checksum of audio files (MP3, Ogg Vorbis, Flac).
The requirement is that the checksum should be stable even if the header (eg. ID3) changes.
Note: The audio files don't have CRCs
This is what I tried by now:
1) Reading + Hashing all MPEG frames using Perl and MPEG::Audio::Frame
my $sha1 = Digest::SHA1->new;
while (my $frame = MPEG::Audio::Frame->read(\*FH)) {
$sha1->add($frame->content());
}
2) Decoding + Hashing all MPEG frames using Python and libmad (pymad)
mf = mad.MadFile(path)
sha1 = hashlib.sha1()
while 1:
buf = mf.read()
if (buf is None):
break
sha1.update(buf)
3) Using mp3cat
> mp3cat - - < file.mp3 | sha1sum
However, none of those methods provided a stable checksum. Namely, in some cases the checksum changed after retagging the file with picard.
Are there any libraries that already provide what I want?
I don't care about the programming language…
Update: I debugged the case a bit further. The libmad checksum inconsitency seems to happen in cases where libmad gets some decoding errors, like "Huffman data overrun (0x0238)". As this really happens on many of the mp3 files I'm not sure if it really indicates a broken file…