views:

55

answers:

1

I have 6 server with a aggregated storage capacity of 175TB all hosting different sorts of media. A lot of the media is double and copies stored in different formats so what I need is a library or something I can use to read the tags in the media and decided if it is the best copy available. For example some of my media is japanese content in which I have DVD and now blu ray rips of said content. This content sometimes has "Hardsubs" ie, subtitles that are encoded into the video and "Softsubs which are subtitles that are rendered on top of the raw video when it plays/ I would like to be able to find all copies of that rip and compare them by resolution and wether or not they have soft subs and which audio format and quality.

Therefore, can anyone suggest a library I can incorporate into my program to do this?

EDIT: I forgot to mention, the distribution server mounts the other servers as drives and is running windows server so I will probably code the solution in C#. And all the media is for my own legal use I have so many copies because some of the stuff is in other format for other players. For example I have some of my blu rays re-encoded to xvid for my xbox since it can't play Blu ray.

When this is done, I plan to open source the code since there doesn't seem to be anything like this already and I'm sure it can help someone else.

A: 

I don't know of any libraries, but as I try to think about how I'd progmatically approach it, I come up with this:

It is the keyframes that are most likely to be comparable. Keyframes occur regularly, but more importantly keyframes occur during massive scene changes. These massive changes will be common across many different formats, and those frames can be compared as still images. You may more easily find a still image comparison library.

Of course, you'll still have to find something to read all the different formats, but it's a start and a fun exercise to think about, even if the coding time involved is far beyond my one-person threshold.

Autocracy
keyframe correlation sounds interesting... but it'd likely be slow... hmmm... I wonder though...
CookieOfFortune
Although that would work, most media formats have tags with the format information. I'm sure I could write a program to read the tabs, but I'm pretty sure there is a library that can do it already. I just can't seem to find one
edude05
Well, if you're trying to identify formats, you should use libmagic. That will classify the file format. As for keyframe correlation, keyframes can be pulled in a single-pass read of the file and stored in an indexed structure. Some of the tricks from imagemagick (another library, yay!) look very compelling. http://www.imagemagick.org/Usage/compare/
Autocracy