views:

54

answers:

1

When writing a MapReduce job (specifically Hadoop if relevant), one must define a map() and a reduce() function, both yielding a sequence of key/value pairs. The data types of the key and value is free to be defined by the application.

In the canonical example of word counting, both functions yield pairs of type (string, int) with the key being a word and the value a count of occurrences. Here - as well as in all other examples I have seen - the outputted key and value types are consistent between the two functions.

Must/should the type of the key/value pair yielded by map() and reduce() be the same within any application of MapReduce? If yes: why?

+3  A: 

No. Obviously, of course, the types of the output pairs for map and the input pairs of reduce must be identical, since one is being passed into the other.

It's quite possible, however, that you might have a reduce task that output a different type than the map pairs that came in - for instance, if the map task counted words in a document, but the reduce task calculated an average word frequency. In that case map would be emitting integers but reduce would be emitting floating point numbers.

Amber