Processing files with headers in Hadoop | ansaurus

tags:

hadoop

views:

66

answers:

2

+1 Q:

Processing files with headers in Hadoop

I want to process a lot of files in Hadoop -- each file has some header information, followed by a lot of records, each stored in a fixed number of bytes. Any suggestions on that?

+2 A:

I think the best solution is to write a custom InputFormat.

Paolo Capriotti 2009-07-09 15:04:10

A:

In addition to write a custom FileInputFormat, you will also want to make sure that the file is not splitable so the reader knows how to process the records inside the file.

phsiao 2009-09-20 17:04:42

related questions

Nutch search always returns 0 results

How do I control output files name and content of an Hadoop streaming job?

Parallelizing Ruby reducers in Hadoop?

hadoop behind the scenes

Implementing large scale log file analytics

Hadoop: map/reduce from HDFS

What is the use of the 'key K1' in the org.apache.hadoop.mapred.Mapper ?

java.io.IOException: Job failed! when running a sample app on my osx with hadoop-0.19.1

Framework for running distributed computations in .NET cloud

Hadoop examples?

MapReduce on AWS

Hadoop Hbase: Spreading column families across tables or not

Amazon S3 architecture

Hadoop on windows server

how to implement eigenvalue calculation with MapReduce/Hadoop?

any feedback / comment for the pigi project ?

how to design Hbase schema ?

Large data - storage and query

Is there a .Net equivalent to Apache Hadoop?

How do you use MapReduce/Hadoop?

Ruby On Rails/Merb as a frontend for a billions of record app

Is it possible to perform arbitrary data analysis in Erlang?

Hbase / Hadoop Query Help

How does Hive compare to HBase?

Experience with Hadoop?