views:

23

answers:

3

Hi,

I have launched a small cluster of two nodes and noticed that the master stays completely idle while the slave does all the work. I was wondering what is the way to let master run some of the tasks. I understand that for a larger cluster having a dedicated master may be necessary but on a 2-node cluster it seems an overkill.

Thanks for any tips,

Vaclav

Some more details:

The two boxes have 2 CPUs each. The cluster has been set up on Amazon Elastic MapReduce but I am running hadoop from commandline.

The cluster I just tried it on has:

Hadoop 0.18
java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) Server VM (build 11.2-b01, mixed mode)


hadoop jar /home/hadoop/contrib/streaming/hadoop-0.18-streaming.jar  \
            -jobconf mapred.job.name=map_data \
            -file /path/map.pl                     \
            -mapper  "map.pl x aaa"                                     \
            -reducer NONE                                     \
            -input   /data/part-*                                         \
            -output  /data/temp/mapped-data                                    \
            -jobconf mapred.output.compress=true

where the input consists of 18 files.

A: 

Actually hadoop master is not the one doing work (tasks you run). You can start datanode and tasktracker on the same machine the master runs.

Wojtek
A: 

Steve Loughran on the hadoop-users list suggested that starting a tasktracker on the master would do the trick.

$ bin/hadoop-daemon.sh start tasktracker

Seems to work. You may want to adjust number of slots for this tasktracker.

Vaclav Petricek
A: 

It may be different for Hadoop 0.18 but you can try adding the IP address of the master to the conf/slaves file - then restart the cluster

Matthew Hegarty