views:

28

answers:

1

I have a simple map-reduce program in which my map and reduce primitives look like this

map(K,V) = (Text, OutputAggregator)
reduce(Text, OutputAggregator) = (Text,Text)

The important point is that from my map function I emit an object of type OutputAggregator which is my own class that implements the Writable interface. However, my reduce fails with the following exception. More specifically, the readFieds() function is throwing an exception. Any clue why ? I use hadoop 0.18.3

10/09/19 04:04:59 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
10/09/19 04:04:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/09/19 04:04:59 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/19 04:04:59 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/19 04:04:59 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/19 04:04:59 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/19 04:04:59 INFO mapred.JobClient: Running job: job_local_0001
10/09/19 04:04:59 INFO mapred.MapTask: numReduceTasks: 1
10/09/19 04:04:59 INFO mapred.MapTask: io.sort.mb = 100
10/09/19 04:04:59 INFO mapred.MapTask: data buffer = 79691776/99614720
10/09/19 04:04:59 INFO mapred.MapTask: record buffer = 262144/327680
Length = 10
10
10/09/19 04:04:59 INFO mapred.MapTask: Starting flush of map output
10/09/19 04:04:59 INFO mapred.MapTask: bufstart = 0; bufend = 231; bufvoid = 99614720
10/09/19 04:04:59 INFO mapred.MapTask: kvstart = 0; kvend = 10; length = 327680
gl_books
10/09/19 04:04:59 WARN mapred.LocalJobRunner: job_local_0001
java.lang.NullPointerException
 at org.myorg.OutputAggregator.readFields(OutputAggregator.java:46)
 at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
 at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
 at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:751)
 at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:691)
 at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:770)
 at org.myorg.xxxParallelizer$Reduce.reduce(xxxParallelizer.java:117)
 at org.myorg.xxxParallelizer$Reduce.reduce(xxxParallelizer.java:1)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:904)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:785)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
 at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:157)
java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113)
 at org.myorg.xxxParallelizer.main(xxxParallelizer.java:145)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
 at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
A: 

When posting a question about custom code: Post the relevant piece of code. So the content of line 46 and a few lines before & after would really help ...:)

However this may help:

THE pitfall when writing your own Writable Class is the fact that Hadoop reuses the actual instance of the class over and over again. Between calls to readFields you do NOT get a shiny new instance.

So at the start of the readFields method you MUST assume the object you are in is filled with "garbage" and must be cleared before continuing.

My suggestion to you is to implement a "clear()" method that fully wipes the current instance and resets it to the state it would be in the moment after it was created and the constructor completed. And of course you call that method as the first thing in your readFields for both the key and the value.

HTH

Niels Basjes