Is there an implementation of MapReduce/Hadoop on Azure?
I'm not sure there's an out-of-the-box solution in Azure, but AWS might have what you're looking for (in beta): http://aws.amazon.com/elasticmapreduce/
Microsoft Research has DryadLINQ which is a powerful LINQ expression distribution engine. I hope they port this to Azure!
Automatic parallelization: from sequential declarative code the DryadLINQ compiler generates highly parallel query plans spanning large computer clusters. For exploiting multi-core parallelism on each machine DryadLINQ relies on the PLINQ parallelization framework.
It has an implementation of MapReduce like this:
public static IQueryable<Rs> MapReduce<Ts, Ms, K, Rs> (
this IQueryable<Ts> source,
Expression<Func<Ts, IEnumerable<Ms>>> mapper,
Expression<Func<Ms, K>> keySelector,
Expression<Func<IGrouping<K, Ms>, IEnumerable<Rs>>> reducer) {
IQueryable<Ms> mapped = source.SelectMany (mapper);
IQueryable<IGrouping<K, Ms>> groups = mapped.GroupBy (keySelector);
return groups.SelectMany (reducer);
}
Amazingly simple implementation! With the power of DryadLINQ, I don't see why you need to be constrained to MapReduce - you can simply create the exact LINQ query that returns the information you're looking for.
NOTE: this is my approximation of their implementation - the PDF does not contain the exact method signature or implementation
Just to share my knowledge on this. We have implemented a MapReduce on the Amazon cloud platform, using cloud services, such as queue (Amazon SQS), table (SimpleDB), and cloud storage(S3). The project is called Cloud MapReduce and it is in open source. The open source version only supports Amazon cloud, but it is fairly easy to port to Azure as Azure has the equivalent of SQS, SimpleDB and S3. Avanade has already ported Cloud MapReduce to Windows Azure, but unfortunately the source code is not open. You will have to contact Avanade to see how to use it. I would be happy to connect if anyone is interested. My contact info is at the Cloud MapReduce site.