views:

142

answers:

1

Hi,

My program follows a iterative map/reduce approach. And it needs to stop if certain conditions are met. Is there anyway i can set a global variable that can be distributed across all map/reduce tasks and check if the global variable reaches the condition for completion.

Something like this.

While(Condition != true){

            Configuration conf = getConf();
            Job job = new Job(conf, "Dijkstra Graph Search");

            job.setJarByClass(GraphSearch.class);
            job.setMapperClass(DijkstraMap.class);
            job.setReducerClass(DijkstraReduce.class);

            job.setOutputKeyClass(IntWritable.class);
            job.setOutputValueClass(Text.class);

}

Where condition is a global variable that is modified during/after each map/reduce execution.

+2  A: 

Each time you run a map-reduce job, you can examine the state of the output, the values contained in the counters, etc, and make a decision at the node that is controlling the iteration on whether you want one more iteration or not. I guess I don't understand where the need for a global state comes from in your scenario.

More generally -- there are two main ways state is shared between executing nodes (although it should be noted that sharing state is best avoided since it limits scalability).

  1. Write a file to HDFS that other nodes can read (make sure the file gets cleaned up when the job exits, and that speculative execution won't cause weird failures).
  2. Use ZooKeeper to store some data in dedicated ZK tree nodes.
SquareCog
@Squarecog - Could you please explain a bit more about how to use counters? Thanks.
Deepak Konidena
Try this for a brief intro: http://philippeadjiman.com/blog/2010/01/07/hadoop-tutorial-series-issue-3-counters-in-action/
SquareCog