In Hadoop you can use the secondary-sort mechanism to sort the values before they are sent to the reducer.
The way this is done in Hadoop is that you add the value to sort by to the key and then have some custom group and key compare methods that hook into the sorting system.
So you'll need to have a key that consists essentially of both the real key and the value to sort by. In order to make this perform fast enough I'll need a way of creating a composite key that is also easy to decompose into the separate parts needed for the group and key compare methods.
What the smartest way is to do this. Is there an "out-of-the-box" Hadoop class that can assist me in this or do I have to create a separate key class for each map-reduce step?
How do I do this if the key actually is a composite that consists of several parts (also needed separately because of the partitioner)?
What do you guys recommend?
P.S. I wanted to add the tag "secondary-sort" but I don't have enough rep yet to do so.