Efficient MapReduce when dealing with streams to queries to the same dataset | ansaurus

tags:

views:

27

answers:

1

Q:

Efficient MapReduce when dealing with streams to queries to the same dataset

Hi, I have a massive, static dataset and I've a function to apply to it.

f is in the form reduce(map(f, dataset)), so I would use the MapReduce skeleton. However, I don't want to scatter the data at each request (and ideally I want to take advantage of indexing in order to speedup f). There is a MapReduce implementation that address this general case?

I've taken a look at IterativeMapReduce and maybe it does the job, but seems to address a slightly different case, and the code isn't available yet.

A:

Hadoop's MapReduce (and all the others map-reduce skeleton inspired by Google) doesn't scatter the data all the time.

akappa 2010-02-07 05:40:07

related questions

What's the best way to unit test concurrent Erlang code?

How can i connect two or more machines via tcp cable to form a network grid?

How are you taking advantage of Multicore?

Start stored procedures sequentially or in parallel

Using Parallel.For to test SQL queries and comparison with the ThreadPool

Multithreaded image processing in C++

RT parallel processing in Rails

What is a good textbook for Parallel Computing?

Easy parallelisation

How to wait untill all child processes called by fork() complete?

Free OpenMosix replacement?

What is the easiest way to parallelize my C# program across multiple PCs

Passing values with Parallel Extensions and VB.net

What's the best way of executing tasks in parallel in Ksh and Perl?

What are some practical problems that parallel computing, f#, and GPU-parallel processing might solve.

How to paralleize search for a string in a file with a help of fork? (GNU Linux/g++)

Unit Testing, Deadlocks, and Race Conditions

How would you simply Monitor.TryEnter

How would you simplfy Entering and Exiting a ReaderWriterLock?

Which parallel programming APIs do you use?

How does NUnit (and MSTest) handle tests that change static/shared variables?

MPI for multicore ?

Is it possible that F# will be optimized more than other .Net languages in the future?

What parallel programming model do you recommend today to take advantage of the manycore processors of tomorrow?

What are the current best options for parallelizing a CPU-intensive .NET app?