image processing with hadoop

To process specialized file formats (such as video) in Hadoop, you'd have to write a custom InputFormat and RecordReader that understands how to turn a video file into splits (the InputFormat) and then read splits into values (the RecordReader). This is a non-trivial task and requires some intermediate knowledge of how Hadoop handles the splitting of data. I highly recommend Tom White's Hadoop the Definitive Guide book by O'Reilly as well as the videos on http://www.cloudera.com. (Full disclosure: I work for Cloudera.)

Keep in mind that video formats are generally compressed which gets even more complicated because InputSplits (created by an InputFormat) are simple byte offsets into the file (normally). Start with http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/InputFormat.html

To summarize: InputFormat knows how to generate a list of InputSplit objects that are (usually) between 64MB and 128MB and do NOT respect the notion of frames. The RecordReader then is used to read frames out of a InputSplit to create value objects that the map reduce job can process. If you want to generate video output you'll also need to write a custom OutputFormat.

Hope this helps.

ansaurus

tags:

views:

answers:

image processing with hadoop

related questions