tags:

views:

189

answers:

1

In an Apache Hadoop map-reduce program, what are the options for using sets/lists as keys in the output from the mapper?

My initial idea was to use ArrayWritable as key type, but that is not allowed, as the class does not implement WritableComparable. Do I need to define a custom class, or is there some other set like class in the Hadoop libraries that can act as key?

A: 

I thought ArrayWritable implemented Writable which is a superinterface of WritableComparable.

Did you subclass ArrayWritable? According to the documentation you need to subclass it so that you can set the type of object to be stored by the array. For example:

public class TextArrayWritable extends ArrayWritable {

    public TextArrayWritable() {
        super(Text.class);
    }
}

Checkout the ArrayWritable javadocs.

Binary Nerd
`ArrayWritable` implements `Writable` but not `WritableComparable`, and apparently the latter is required for the class to be used for keys. I could subclass `ArrayWritable` and add support for the `WritableComparable` interface, but is this necessary?
Jørn Schou-Rode
Ah sorry looked a bit closer. The key needs WritableComparable because hadoop needs to be able to sort the keys. So, yes you could implement the WritableComparable interface which just requires you to override the compareTo method. Hope this helps.
Binary Nerd