tags:

views:

66

answers:

2

I want to process a lot of files in Hadoop -- each file has some header information, followed by a lot of records, each stored in a fixed number of bytes. Any suggestions on that?

+2  A: 

I think the best solution is to write a custom InputFormat.

Paolo Capriotti
A: 

In addition to write a custom FileInputFormat, you will also want to make sure that the file is not splitable so the reader knows how to process the records inside the file.

phsiao