views:

169

answers:

4

I suspect that I will soon exhaust the speed improving possibilities of threading on multiple cores in a single computer.

What does this .NET desktop programmer need to learn to move a parallel-feasible problem onto multiple computers? My preference is to minimize the total lifecycle programming effort so it would be preferred if there were minimal changes between on-premises deployment and off-premises deployment.

With respect to programmer man-hours, is linux, LAMP or some other stack way better than C#.NET on Windows for such an application?

Edit: Some additional information from my own comments below. The compute-intensive part of the problem can be made arbitrarily large so overheads to distribute/recombine are not a worry because overhead will be only a small percentage of the time you have to wait for a result. This is a one man development team. Just a suggestion and I don't know if it is any good or not: how about WCF and XML as means to distribute the problem in a completely on-premises Azure-ignorant way and trust that it will (someday) work on Azure without changes and without the benefits of being Azure aware. This is just an unresearched idea and I'm hoping somebody has a better one even if it is not a Windows solution.

Another edit: Digipede has an offering for performance improvements and a paper on the distinction between a cluster and a grid.

http://www.digipede.net/downloads/Digipede_CCS_Whitepaper.pdf

Since my problem is more grid-like than cluster and I want to do it cheaply, I'll just try the WCF approach.

A: 

Honestly I'd say there isn't any difference between stacks. The challenge you will have is in breaking up the work and reconstituting the output of each machine. Microsoft has an HIV research project that does exactly what you want using .NET technology to "divide and conquer" a large computational problem.

Achilles
Actually breaking it up and combining the results is easy. It is a parallel-feasible problem. I already had to do that for multi-threading and the results are identical to the single threading case so there are no bugs.
broiyan
"...so there are no bugs" That is quite a statement.
Adam Robinson
Ok, it feels like there are no bugs.
broiyan
+3  A: 

I would recommend reading in on the CCR and DSS technologies from Microsoft. It is a really nice implementation of parallelizing through sending bits of work to 'ports'. These ports are read by workers (threads) which as an added effect makes really effective use of available cores.

The DSS is an extra layer which makes it easy to use the same concept over multiple machines.

a nice introduction can be read here: concurrent affairs

a very nice third party library xcoappspace is available as an alternative implementation of cross computer communication based on the ccr. I think it is even easier than the dss. A nice article to read after you finish the CCR article ;^) xcoappspace

a lot of these concepts were popularized by the language Erlang.

Toad
+5  A: 

The main thing to watch out for when moving from multi threaded to distributed computing is the increased overhead to spool up jobs on remote machines compared to spooling up another thread on the current machine. The granularity of the work items needs to be large enough to justify significantly slower communication between nodes - messaging between threads on the same computer is many orders of magnitude faster than messaging between different computers over the network.

Sharing resources is more difficult across machines. Sharing objects in memory is simple in multiple threads in the same process, but takes some engineering to achieve similar across machines. Locks basically don't exist across machines. Look to using a message queue service/server to coordinate work between multiple machines, return results to aggregator, etc.

You mention "on premises vs off premises". If you are considering off-premises computing resources, be sure to search around for cloud computing or elastic computing service providers. Oddly enough, these are not used in the same breath as parallel programming as often as you'd think. Cloud computing offers you the option to scale your parallelism up to hundreds or thousands of compute nodes that you pay for only while you're actually using them. When your computation is done, or the live source for your data to analyze goes home at the end of the day, you can "lights out" your cloud nodes and stop the billing clock until you start them up again.

Amazon, Google, and Microsoft are three big providers of cloud service (among others), and each has very different characteristics, strengths and weaknesses. I work on Azure stuff at Microsoft. Azure's built-in message queues are pretty slick for running producer/consumer workflows at scale.

Whether you use LAMP or .NET as your platform is really less about performance questions and more about the tools and skill sets you have within your development team. Deliberately selecting a target platform that is a mismatch with your dev team's skill set is a great way to add a lot of time and retraining costs to your project schedule.

C#/.NET works very well for coding parallel systems compared to C++ or scripting in other environments. Consider language features, debugging tools, and prebuilt libraries and services available to you when evaluating which platform is best suited to your skill set and desired system design.

dthorpe
The problem takes hours to solve and so the overheads you mention are trivial compared to the total elapsed time. The problem can be made arbitrarily large by changing a couple of parameters and I always set the parameters smaller than I would like, unfortunately. The team is just me so I am concerned about minimizing the effort to learn and implement. Porting to another language and stack is something I would consider if parallelism is easier to implement. I am surprised nobody mentioned WCF yet.
broiyan
How would I go parallel multi-computer on-premises in a way that makes me Azure-ready?
broiyan
I guess I am trying to minimize time and retraining by hoping somebody has a pattern to share. May be a WCF pattern that is completely Azure-ignorant will still be portable to Azure. I don't know much about WCF so I'm just throwing it out there.
broiyan
dthorpe
Instead of using an existing message queue service, you could also implement your own work distribution protocol over TCP/IP, using WCF or any other communications framework. Compute nodes running the Azure cloud can serve or consume TCP/IP connections much as any other machine. You can work out your work distribution protocol now using your on-premises hardware and scale it out to Azure at some point in the future. There's a fair amount of work involved to write your own producer/consumer work distribution queue to cover guaranteed delivery, timeouts, etc. but it's interesting work. ;>
dthorpe
One way to guarantee delivery and avoid implementing timeouts is to just not trip over the ethernet cable that carries your WCF ;)
broiyan
True, true. But as soon as your comms go out across the public internet, connection failures >
dthorpe
+4  A: 

Creating a compute farm mechanism using WCF would be straightforward IMO. As you're already using C# on Windows, this is a natural progression, compared to switching language or technology stack.

An early step in this process would be to design a mechanism whereby compute-workers could advertise their availability to a master machine. Either the master would have apriori knowledge of the workers, or (better) they need a consistent mechanism to 'locate' the server, e.g. at a well known domain. Putting the master at say, www.all-your-cycles-belong-to-us.org, would all you to have a WCF service serving incoming offers of compute time. If your delegation mechanism can tune itself according to the number of workers, all the better.

Defining your service, data and fault contracts between the Master and the workers may take some experimentation to achieve the best balance of programming elegance, computational throughput and flexibility/future-proofing.

From experience, the kind of challenges of this (and other) approaches are:

  1. Worker goes quiet.

    Whether due to network issues, being 'busy' for long periods or actual downtime is hard to tell until communication to the master can be re-established. In my day job, we have thousands of machines that 'call home' periodically and going a whole hour without calling home is considered 'down'. Should you set another worker off doing the same work or wait an arbitrary amount of time for the original to complete? Only you know your algorithm, but a blend of both approaches may help.

  2. Abusing the workers.

    If your computational problem is genuinely difficult, you could flat-line the CPU on all the workers. Would this be acceptable? If you're renting the CPU cycles, then yes. If you're slooping spare cycles on idle machines (a la SETI), then no.

  3. Results arrive out of order.

    Can your results set be re-assembled in the correct order by the master if different workers finish at different times?

  4. Code versioning.

    If you fix the code, how do you get it sent out to all the workers to ensure that they have the right version? There are lots of options to solve this problem but its worth thinking through earlier rather than later.

  5. Dissimilar workers.

    Having a top-of-the-line multi-CPU worker participating in your compute farm along with lowly solo-core-solo-CPU machines would yield bizarre behaviour if you didn't know that the workers were different specs. Adapting your WCF interfaces to allow a Worker to hint how much load it could take on may be worth some attention.

JBRWilkinson
Thanks. These are all valid concerns which fortunately I will not need to worry about in the early stages as there will be only a small number of computers and they will all be under my desk and dedicated to the problem and I will be careful not trip over the ethernet cable until I have time to build in robustness.
broiyan