views:

231

answers:

2

Hi,

I used hadoop to run map-reduce applications on our cluster. The jobs take around 10 hours to complete daily. I want to know the time taken for each job, and the time taken by the longest job etc..so, that I can optimize those jobs. Is there any plugin or script that does this?

Thank you
Bala

+1  A: 

First, have you been looking at the job tracker UI that comes with Hadoop to track the progress of jobs. You should check all the standard counter statistics each job produces as well as any custom counters you have added to a job.

An interesting alternative might be to take a look at Cloudera Desktop.

I also found this article from Cloudera useful: 7 tips for improving MapReduce performance

Out of interest, are you optimizing your jobs because they are taking too long?

Binary Nerd
@Binary Nerd There are few places, where our code could be improved, but we want to prioritize them based on what we gain by making these changes in $'s.
Algorist
+1  A: 

Take a look at http://:50030 or http://:50030/jobhistory.jsp (at the bottom.

There is a analysis for each Job/Task/Task-Part (Map, Sort, Reduce). Pretty handy. You could write your own logs - I just "wget" all the Analysis-Pages and put them through awk for a crude statistics.

Leonidas