ansaurus

Question

Answer 1

+1 A:

You have to implement your own input format. You also have the possibility to define your own record reader then.

Unfortunately you have to define a getSplits()-method. In my opinion this will be harder than implementing the record reader: This method has to implement a logic to chunk the input data.

See the following excerpt from "Hadoop - The definitive guide" (a great book I would always recommend!):

Here’s the interface:

public interface InputFormat<K, V> {
  InputSplit[] getSplits(JobConf job, int numSplits) throws IOException;
  RecordReader<K, V> getRecordReader(InputSplit split,
                                     JobConf job, 
                                     Reporter reporter) throws IOException;
}

The JobClient calls the getSplits() method, passing the desired number of map tasks as the numSplits argument. This number is treated as a hint, as InputFormat imple- mentations are free to return a different number of splits to the number specified in numSplits. Having calculated the splits, the client sends them to the jobtracker, which uses their storage locations to schedule map tasks to process them on the tasktrackers.

On a tasktracker, the map task passes the split to the getRecordReader() method on InputFormat to obtain a RecordReader for that split. A RecordReader is little more than an iterator over records, and the map task uses one to generate record key-value pairs, which it passes to the map function. A code snippet (based on the code in MapRunner) illustrates the idea:

K key = reader.createKey();
V value = reader.createValue();
while (reader.next(key, value)) {
  mapper.map(key, value, output, reporter);
}

Peter Wippermann 2010-04-27 07:42:09

That kinda works. But that really doesn't answer the question. There is an issue with adding new InputFormats under 18.3.

monksy 2010-04-29 09:56:21

Ok I'm sorry. Indeed there is no real question, since I see no question mark :-P So what else do you need to know more specific?

Peter Wippermann 2010-04-29 10:04:11

ansaurus

tags:

views:

answers:

Multiple lines of text to a single map

related questions