When writing a MapReduce job (specifically Hadoop if relevant), one must define a map()
and a reduce()
function, both yielding a sequence of key/value pairs. The data types of the key and value is free to be defined by the application.
In the canonical example of word counting, both functions yield pairs of type (string, int)
with the key being a word and the value a count of occurrences. Here - as well as in all other examples I have seen - the outputted key and value types are consistent between the two functions.
Must/should the type of the key/value pair yielded by map()
and reduce()
be the same within any application of MapReduce? If yes: why?