Hi,
I have launched a small cluster of two nodes and noticed that the master stays completely idle while the slave does all the work. I was wondering what is the way to let master run some of the tasks. I understand that for a larger cluster having a dedicated master may be necessary but on a 2-node cluster it seems an overkill.
Thanks for any tips,
Vaclav
Some more details:
The two boxes have 2 CPUs each. The cluster has been set up on Amazon Elastic MapReduce but I am running hadoop from commandline.
The cluster I just tried it on has:
Hadoop 0.18
java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) Server VM (build 11.2-b01, mixed mode)
hadoop jar /home/hadoop/contrib/streaming/hadoop-0.18-streaming.jar \
-jobconf mapred.job.name=map_data \
-file /path/map.pl \
-mapper "map.pl x aaa" \
-reducer NONE \
-input /data/part-* \
-output /data/temp/mapped-data \
-jobconf mapred.output.compress=true
where the input consists of 18 files.