views:

156

answers:

1

I'm thinking about developing a framework to simplify running distributed computations in .NET cloud environment of Windows Azure.

Azure currently (and by the time of the release, most likely) is completely unsuited for simple running of distributed queries in the cloud (details). Simple for me is something like DryadLINQ where you can write a query:

var results = from c in collection
  where IsLegal(c.Key)
  select new 
  { 
    Key = Hash(c.Key), 
    Result = RunModel(c.Value); 
  };

and have it executed remotely over multiple machines in the cluster. No deployments, storages or configurations to bother with.

What resources, papers or open source projects could you advise to check for additional information on the subject (esp. scheduling and DAG optimization)?

I've been digging around Hadoop (used by Amazon Elastic Map Reduce) & DryadLINQ so far. Obviously this includes Googling.

A: 

There are a lot of interesting papers at Google Research.

The MapReduce Paper might be a good place to start if you haven't read it yet.

Dave Webb
Could you advise any?
Rinat Abdullin
I'd start with the MapReduce paper if you haven't read that yet, but I'm no expect in this field so I wouldn't know where to go next.
Dave Webb