views:

57

answers:

1

Grep seems not to be working for hadoop streaming

For: hadoop jar /usr/local/hadoop-0.20.2/contrib/streaming/hadoop-0.20.2-streaming.jar -input /user/root/tmp2/user.data -output /user/root/selected_data -mapper '/bin/grep 1938678460' -reducer 'wc' -jobconf mapred.output.compress=false

I get: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:17

Any idea?

I also tried: -mapper 'cat' -reducer '/bin/grep 1938678460' (cat works, grep does not)

....I also checked on all machines that /bin/grep is there and it is

Grep does not work , or I'm missing something?

+3  A: 

I haven't tried this myself, but grep exits with a non-zero exit code if it didn't find something. If a map doesn't contain the string you grep for, you get a non-zero exit code and hadoop will error. Maybe something like "/bin/grep || true" works.

Wouter de Bie
you where right and this fix actually fixed it: -jobconf stream.non.zero.exit.is.failure=false
Federico