I want to process a lot of files in Hadoop -- each file has some header information, followed by a lot of records, each stored in a fixed number of bytes. Any suggestions on that?
+2
A:
I think the best solution is to write a custom InputFormat
.
Paolo Capriotti
2009-07-09 15:04:10
A:
In addition to write a custom FileInputFormat, you will also want to make sure that the file is not splitable so the reader knows how to process the records inside the file.
phsiao
2009-09-20 17:04:42