I'm thinking about developing a framework to simplify running distributed computations in .NET cloud environment of Windows Azure.
Azure currently (and by the time of the release, most likely) is completely unsuited for simple running of distributed queries in the cloud (details). Simple for me is something like DryadLINQ where you can write a query:
var results = from c in collection
where IsLegal(c.Key)
select new
{
Key = Hash(c.Key),
Result = RunModel(c.Value);
};
and have it executed remotely over multiple machines in the cluster. No deployments, storages or configurations to bother with.
What resources, papers or open source projects could you advise to check for additional information on the subject (esp. scheduling and DAG optimization)?
I've been digging around Hadoop (used by Amazon Elastic Map Reduce) & DryadLINQ so far. Obviously this includes Googling.