views:

83

answers:

4

As the title says. I was reading Yet Another Language Geek: Continuation-Passing Style and I was sort of wondering if MapReduce can be categorized as one form of Continuation-Passing Style aka CPS.

I am also wondering how can CPS utilise more than one computer to perform complex computation. Maybe CPS makes it easier to work with Actor model.

+1  A: 

I wouldn't say so. MapReduce does execute user-defined functions, but these are better known as "callbacks". I think CPS is a very abstract concept that is commonly used to model the behavior of better-known concepts like functions, coroutines, callbacks and loops. It is generally not used directly.

Then again, I may be confusing CPS with continuations themselves. I'm not an expert on either one.

Qwertie
Monadic programming, such as is done all the time in Haskell and sometimes in other languages such as OCaml, is really CPS. So it is used directly, just under a different name and frequently with sugar.
Michael E
+1  A: 

I would say they're opposites. MapReduce obviously lends itself to distribution, where Map can do subtasks independently. With CPS you write a recursive function where each call is waiting on a smaller case to get back.

I think CPS is one of the programming techniques that Guy Steele describes as things we need to outgrow and unlearn, in his talk on The Future of Parallel: What's a Programmer to do?

Nathan Hughes
Actually, Guy doesn't seem to mention CPS in that talk, as far as I can see.
RD1
+1  A: 

Both CPS and MapReduce make use of higher order functions. This means that both involve functions that take functions as arguments.

In the case of CPS you have a function (called a continuation) with an argument that says what to do with a result. Typically (but not always) the continuation is used once. It's a function that specifies how the whole of the rest of the computation should continue. This also makes it a serial kind of thing. Typically you have one thread of execution and the continuation specifies how it's going to continue.

In the case of MapReduce you're providing function arguments that are used multiple times. These argument functions don't really represent the whole of the rest of the computation, but just little building blocks that are used over and over again. The "over and over" bit can often be distributed over multiple machines making this a parallel kind of thing.

So you're right to see a commonality. But one isn't really an example of the other.

+1  A: 

Map-reduce is an implementation. The coding interface which lets you use that implementation could use continuations; it's really a matter of how the framework and job control are abstracted. Consider declarative interfaces for Hadoop such as Pig, or declarative languages in general such as SQL; the machinery below the interface may be implemented in many ways.

For example, here's an abstracted Python map-reduce implementation:

def mapper(input_tuples):
    "Return a generator of items with qualifying keys, keyed by item.key"
    # we are seeing a partition of input_tuples
    return (item.key, item) for (key, item) in input_items if key > 1)

def reducer(input_tuples):
    "Return a generator of items with qualifying keys"
    # we are seeing a partition of input_tuples
    return (item for (key, item) in input_items if key != 'foo')

def run_mapreduce(input_tuples):
    # partitioning is magically run across boxes
    mapper_inputs = partition(input_tuples)
    # each mapper is magically run on separate box
    mapper_outputs = (mapper(input) for input in mapper_inputs)
    # partitioning and sorting is magically run across boxes
    reducer_inputs = partition(
        sort(mapper_output for output in mapper_outputs))
    # each reducer is magically run on a separate box
    reducer_outputs = (reducer(input) for input in reducer_inputs)

And here's the same implementation using coroutines, with even more magical abstraction hidden away:

def mapper_reducer(input_tuples):
    # we are seeing a partition of input_tuples
    # yield mapper output to caller, get reducer input
    reducer_input = yield (
        item.key, item) for (key, item) in input_items if key > 1)
    # we are seeing a partition of reducer_input tuples again, but the
    # caller of this continuation has partitioned and sorted
    # yield reducer output to caller
    yield (item for (key, item) in input_items if key != 'foo')
Karl Anderson