views:

77

answers:

2

how can i create a checksum of only the media data without the metadata to get a stable identification for a media file. preferably an cross platform approach with a library that has support for many formats. e.g. vlc, ffmpeg or mplayer.

(media files should be audio and video in common formats, images would be nice to have too)

A: 

one possible solution i found seems to be with vlc:

./VLC -I rc snd.mp3 :sout='#std{mux=raw,access=file,dst=-}' vlc://quit | sha1sum
yawniek
seems to not work for movies, seems to not be platform independent
yawniek
+3  A: 

I don't know of any existing platform-independent software that will accomplish this, but I do know a way that this could be accomplished in an interpreted (platform-independent) language such as Java.

Essentially, we simply need to strip any metadata (tags) from the file, demultiplexing video files beforehand. Theoretically after demux and removing metadata, one could hash the file and compare against another file that has undergone the same process to match identical files despite having different tags. Unlike a fingerprint, this would not identify similar songs/movies but identical files (imagine you might want the 10 different versions or bitrates of a given song you've archived, but don't want 2 identical copies of any of them floating around).

The most troubling part of this is removing tags as there are many different specifications for tag formats which are not necessarily implemented the same across different applications, i.e. the same exact audio file given identical tags separately through two different applications may not result in identical output files. The only way this could pose an issue fatal to the concept of an audio-only checksum is if popular tagging software makes any changes to the binary audio portion of the file, or pads the audio in a non-standard way.

Taking a checksum is trivial, but I'm not aware off the top of my head of any platform independent libraries to demux and detag mpeg files. I know that in 'nix environments, mpgtx is a great command-line tool that could perform the demux and detag, but obviously that is not a platform-independent solution.

Maybe someone out there feels ambitious?

Dustin Fineout
this is the way to go.in the meantime i wrote ha patch for ffmpeg to calculate sha1 hashes instead of adler32 checksum. this essentially does the trick. if anyone would like to help me bringing this to ffmpeg that would be great.
yawniek