I have a four machine Hadoop cluster setup that I've verified works correctly using the bundled WordCount example running locally from the NameNode machine. I'm now starting to write my own MapReduce classes in Java which I've bundled into a JAR with the necessary driver class that extends Configured
and implements Tool
.
I'm trying to run my JAR from my local Windows XP box, which is obviously not part of the cluster. Since I have the Hadoop JARs on my PC, I'm try to run it as follows:
org.apache.hadoop.util.RunJar map-reduce-test-1.0-SNAPSHOT.jar com.example.mapred.MyDriver -conf hadoop-cluster.xml
When I do the above, I can see it connecting into my cluster, but I get the following exceptions in the cluster logs (and on my local PC):
Sep 29, 2010 11:08:37 AM org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: Task Id : attempt_201009271128_0003_m_000002_2, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException com.example.mapred.MyMapper
Do I need to copy my JAR over to the NameNode or JobTracker machine so it can find it? I was sort of thinking RunJar would stream the JAR from my PC into the cluster, but from the error it seems as if my MapReduce JAR has to be on a cluster machine somewhere?