ansaurus

Question

The difference between MapReduce and the map-reduce combination in functional programming.

Answer 1

+1 A:

Using the word count example, the original functional map() would take a set of documents, optionally distribute subsets of that set, and for each document emit a single value representing the number of words (or a particular word's occurrences) in the document. A functional reduce() would then add up the global counts for all documents, one for each document. So you get a total count (either of all words or a particular word).

In MapReduce, the map would emit a (word, count) pair for each word in each document. A MapReduce reduce() would then add up the count of each word in each document without mixing them into a single pile. So you get a list of words paired with their counts.

Max Shawabkeh 2010-01-23 03:42:31

Answer 2

+2 A:

The main difference would be that MapReduce is apparently patentable. (Couldn't help myself, sorry...)

On a more serious note, the MapReduce paper, as I remember it, describes a methodology of performing calculations in a massively parallelised fashion. This methodology builds upon the map / reduce construct which was well known for years before, but goes beyond into such matters as distributing the data etc. Also, some constraints are imposed on the structure of data being operated upon and returned by the functions used in the map-like and reduce-like parts of the computation (the thing about data coming in lists of key/value pairs), so you could say that MapReduce is a massive-parallelism-friendly specialisation of the map & reduce combination.

As for the Wikipedia comment on the function being mapped in the functional programming's map / reduce construct producing one value per input... Well, sure it does, but here there are no constraints at all on the type of said value. In particular, it could be a complex data structure like perhaps a list of things to which you would again apply a map / reduce transformation. Going back to the "counting words" example, you could very well have a function which, for a given portion of text, produces a data structure mapping words to occurrence counts, map that over your documents (or chunks of documents, as the case may be) and reduce the results.

In fact, that's exactly what happens in this article by Phil Hagelberg. It's a fun and supremely short example of a MapReduce-word-counting-like computation implemented in Clojure with map and something equivalent to reduce (the (apply + (merge-with ...)) bit -- merge-with is implemented in terms of reduce in clojure.core). The only difference between this and the Wikipedia example is that the objects being counted are URLs instead of arbitrary words -- other than that, you've got a counting words algorithm implemented with map and reduce, MapReduce-style, right there. The reason why it might not fully qualify as being an instance of MapReduce is that there's no complex distribution of workloads involved. It's all happening on a single box... albeit on all the CPUs the box provides.

For in-depth treatment of the reduce function -- also known as fold -- see Graham Hutton's A tutorial on the universality and expressiveness of fold. It's Haskell based, but should be readable even if you don't know the language, as long as you're willing to look up a Haskell thing or two as you go... Things like ++ = list concatenation, no deep Haskell magic.

Michał Marczyk 2010-01-23 04:48:20

Regarding "there are no constraints at all on the type of said value": you make it sound as if MapReduce requires a particular data structure. It does not.

Max Shawabkeh 2010-01-23 04:55:05

Well, the MapReduce paper does describe the Map step as producing a list of key/value pairs. This doesn't require a particular data structure -- like a linked list or hash table -- but sure seems to require a particular structure of data -- namely a mapping between keys and values. That's why I use the latter expression in the answer. That being said, I suppose there's nothing to prevent a MapReduce-like operation to proceed on data of a different structure when appropriate... though I've no idea whether that'd fall under the wording of the patent (a *boo!* for software patents!).

Michał Marczyk 2010-01-23 05:13:23

Also, there is of course no reason why the values in the key/value pairs would need to be of any particular type... I certainly never meant to imply that.

Michał Marczyk 2010-01-23 05:16:33

In practice, MapReduce is very rarely used on strings. It was made originally for protocol buffer (http://code.google.com/p/protobuf/) which describe and contain arbitrary data structures. The key-value pair is simply a step that allows a single `Map()` to feed many `Reduce()` s in a single run as there's a Reducer for each key.

Max Shawabkeh 2010-01-23 05:53:45

Well, that's true, however it doesn't change the fact that MapReduce as presented in both the original paper and the patent claim (the paper for sure, the claim I can hardly read for its legalese, so that's just insofar as I can tell) does involve the key/value business. The fact that it's because of engineering concerns over how to distribute things reinforces the point about MapReduce going beyond the FP map+reduce. From an abstract point of view, however, dividing the data into blocks based on the keys is just part of the reduce step. So I'd still say MapReduce is a specialisation of m+r.

Michał Marczyk 2010-01-23 06:34:27

Answer 3

+1 A:

MapReduce is a framework built around splitting a computation into parallelizable mappers and reducers. It builds on the familiar idiom of map and reduce - if you can structure your tasks such that they can be performed by independent mappers and reducers, then you can write it in a way which takes advantage of a MapReduce framework.

Imagine a Python interpreter which recognized tasks which could be computed independently, and farmed them out to mapper or reducer nodes. If you wrote

reduce(lambda x, y: x+y, map(int, ['1', '2', '3']))

or

sum([int(x) for x in ['1', '2', '3']])

you would be using functional map and reduce methods in a MapReduce framework. With current MapReduce frameworks, there's a lot more plumbing involved, but it's the same concept.

Karl Anderson 2010-01-26 19:59:08

ansaurus

tags:

views:

answers:

The difference between MapReduce and the map-reduce combination in functional programming.

related questions