views:

8

answers:

1

I was trying to find the sum of any given points using hadoop, but my problem is on getting all values from a given key in a single reducer. It is some thing like this.

I have this reducer

public static class Reduce extends MapReduceBase implements Reducer {

    public void reduce(Text key, Iterator<IntWritable> values,
            OutputCollector<Text, DoubleWritable> output, Reporter reporter)
            throws IOException {
        Text word = new Text();

        Iterator<IntWritable> tr = values;
        IntWritable v;
        while (tr.hasNext()) {
             v = tr.next();

            Iterator<IntWritable> td = values;
            while (td.hasNext()) {

                IntWritable u = td.next();
                double sum = u+v;
                word.set( u + " + " + v);
                output.collect(word, new DoubleWritable(sum));

            }

            // }

        }

    }
}

And i was trying to create two copies of the Iterator variable so that i can go through all the values of the second iterator while i get a single value from the previous Iterator( Two while loops above) but when ever i run the program the two iterators hold same value all the time.

I am not sure if this is the right of doing it, Any help is really appreciated.

Thanks,

Tsegay

A: 

I'm not sure exactly what you're trying to accomplish, but I know this much: the behavior of Hadoop's Iterators is a bit strange. Calling Iterator.next() will always return the SAME EXACT instance of IntWritable, with the contents of that instance replaced with the next value. So holding a reference to the IntWritable across calls to Iterator.next() is almost always a mistake. I believe this behavior is by design to reduce the amount of object creation and GC overhead.

One way to get around this is to use WritableUtils.clone() to clone the instance you're trying to preserve across calls to Iterator.next().

bajafresh4life