views:

563

answers:

9

I have many unused computers at home. What would be the easiest way for me to utilize them to parallelize my C# program with little or no code changes?

The task I'm trying to do involves looping through lots of english sentences, the dataset can be easily broken into smaller chunks, processed in different machines concurrently.

+1  A: 

That's probably not possible.

How to parallelize a program depends entirely on what your program does and how it is written, and usually requires extensive code changes and increases the complexity of your program many fold.

The usual way to easily increase concurency in a program is take a task that is repeated many times and just write a function that splits that task into chunks and sends them to different cores to process.

WalloWizard
+7  A: 

… with little or no code changes?

Difficult. Basically, look into WCF as a way to communicate between various instances of the program across the network. Depending on the algorithm, the structure might have to be changed drastically, or not at all. In any case, you have to find a way to separete the problem into parts that act independently from each other. Then you have to devise a way of distributing these parts between different instances, and collecting the resulting data.

PLinq offers a great way to parallelize your program without big changes but this only works on one process, across different threads, and then only if the algorithm lends itself to parallelization. In general, some manual refactoring is necessary.

Konrad Rudolph
+1  A: 

You need to run your application on a distributed system, google for distributed computation windows or for grid computing c#.

CodeForNothing
+2  A: 

The answer depends on the nature of the work your application will be doing. Different types of work have different possible parallelization solutions. For some types there is no possible/feasible way to parallelize.

The easiest scenario I can think of is for an application which work can easily be broken in discrete job chunks. If this is the case, then you simply design your application to work on a single job chunk. Provide your application with the ability to accept new jobs and deliver the finished jobs. Then, build a job scheduler on top of it. This scheduler can be part of the same application (configure one machine to be the scheduler and the rest as clients), or a separate application.

There are other things to consider: How will occur the communication among machines (files?, network connections?); the application need to be able to report/be_queried about percent of job completed?; there is a need to be able to force the application to stop proccessing current job?; etc.).

If you need a more detailed answer, edit your question and include details about the appplication, the problem the application solves, the expected amount of jobs, etc. Then, the community will come with more specific answers.

vmarquez
+1  A: 

Is each sentence processed independently, or are they somehow combined? If your processing operates on a single sentence at a time, you don't need to change your code at all. Just execute the same code on each of your machines and divide the data (your list of sentences) between them. You can do this either by installing a portion of the data on each machine, or by sharing the database and assigning a different chunk to each machine.

If you want to change your code slightly to facilitate parallelism, share the entire database and have the code "mark" each sentence as it's processed, then look for the next unmarked sentence to process. This will give you a gentle introduction to the concept of thread safety -- techniques that ensure one processor doesn't adversely interfere with another.

As always, the more details you can provide about your specific application, the better the SO community can tailor our answers to your purpose.

Good luck -- this sounds like an interesting project!

Adam Liss
A: 

Before I would invest in parallelizing your program, why not just try breaking the datasets down into pieces and manually run your program on each computer and collate the outputs by hand. If that works, then try automating it with scripts and write a program to collate the outputs.

tvanfosson
+2  A: 

Dryad (Microsoft's variation of MapReduce) addresses exactly this problem (parallelize .net programs across multiple PCs). It's in research stage right now. Too bad there are no CTPs yet :-(

Mauricio Scheffer
A: 

There are several software solutions that allow you to use commodity based hardware. One is Appistry. I work at Appistry and we have done numerous solutions to run C# applications across hundreds of machines.

A few useful links: http://www.appistry.com/resource-library/index.html

You can download the product for free here: http://www.appistry.com/developers/

Hope this helps -Brett

Brett McCann
A: 

You might want to look at Flow-Based Programming - it has a Java and a C# implementation. Most approaches to this problem involve trying to take a conventional single-threaded program and figure out which parts can run in parallel. FBP takes a different approach: the application is designed from the start in terms of multiple "black-box" components running asynchronously (think of a manufacturing assembly line). Since a conventional single-threaded program acts like a single component in the FBP environment, it is very easy to extend an existing application. In fact, pieces of an existing app can often be broken off and turned into separate components, provided they can run asynchronously with the rest of the app (i.e. not subroutines). Someone called this "turning an iceberg into ice cubes").

Paul Morrison