views:

45

answers:

1

Hi,

In the "syslog" for a MapReduce job flow step, I see the following:

Job Counters
  Launched reduce tasks=4
  Launched map tasks=39

Does the number of launched map tasks include failed tasks?

I am using NLineInputFormat class as input format to manage the number of map tasks. However, I get slightly different numbers for exact same input occasionally, or depending on the number of instances (10, 15, and 20).

Can anyone tell me why I am seeing different number of tasks launched?

A: 

This is, more than likely, speculative execution kicking in. When Hadoop has available resources, it may opt to run two attempts of the same task at the same time. Launched tasks include all tasks launched regardless of whether they later succeed, fail (due to exceptions), or are killed (due to admin interference or speculative execution killing the "slower" task after the "faster" task completes).

Your total tasks - failed - killed will probably be the same between runs.

Hope this helps.

Eric Sammer