views:

62

answers:

1

I have hadoop job with tasks that are expected to run for significant length of fime (few minues). However hadoop starts speculative execution too soon. I do not want to turn speculative execution completely off but I want to increase duration of time hadoop waits before considering job for speculative execution. Is there a config option to control this timeout?

Thanks

A: 

I don't believe the speculative execution time is currently configurable. On the other hand, there's probably no need to adjust it. Speculative execution is meant to bail you out of slow running tasks (usually due to degraded hardware performance). If you have available cluster resources such that spec exec is kicking in, what's the harm in letting it do so? Note that minutes is not considered "significant" and is more than normal for medium or larger size jobs.

It's also worth noting that while mapper spec exec is almost always fine and low overhead to the system, reducer spec exec can hurt and probably should be disabled. The rationale is that if a mapper is progressing slowly and there are available resources where the data is local (normal), there's no shared overhead. If a reducer is performing slowly, starting another attempt of the same task will simply double the network load - normally the most painful part of reducer execution. If the network is what is causing the reducer to be "slow," starting a second attempt only hurts both attempts.

If you truly have a use case for adjusting the spec exec time, it might be worth filing a jira at http://issues.apache.org.

Hope this helps.

Eric Sammer
This helps. I do not believe that my specific use case fits in general philosophy of Hadoop therefore it's probably not worth it to file a jira. I ended up disabling speculative execution in my scenario.
S.O.