tags:

views:

8

answers:

1

As far as I understand hadoop architecture considers all machines to be equal with any task/job being able to run on all and any of the machines in the cluster.

Is there a way to change this model to tag certain machines as having certain capabilities and then only pick machines that have capabilities required by a job to run that job?

A: 

Figured this one out. Since I am using the FairScheduler there is an extensibility point there that allows me to achieve my goal by writing a simple class implementing LoadManager interface

According to http://hadoop.apache.org/common/docs/current/fair_scheduler.html, FairScheduler uses instance of a class specified in mapred.fairscheduler.loadmanager config property (CapBasedLoadManager by default). The LoadManager interface provides convinient method

boolean canLaunchTask(TaskTrackerStatus tracker, JobInProgress job,  TaskType type)

which allows me to have custom logic to allow or deny particular job to run on a particular task tracker. Problem solved.

Lesson learned: reading source code is useful.

S.O.