views:

109

answers:

4

I'm not sure if this kind of question is appropriate, but its been suggested I ask here, so here goes.

For a subject at university this semester our assignment is to take some existing code and parallelize it. We've got a bit of an open end on it, but open source is really the only way we are going to get existing code. I could write some code and then parallelize it, but existing code (and perhaps one where I could make a genuine contribution) would be best, to avoid doubling my workload for little benefit.

I was hoping to use C# and make use of the new Task parallel library, but I'm struggling to find some C# open source projects that are computationally expensive enough to make use of parallelization (and don't already have it).

Does anyone have some suggestions of where to look? Or is C# just not going to have enough of that kind of thing as open source (should perhaps try C++?)

+4  A: 

I don't know if they already use parallel tasks, but good candidates are image manipulation programs, such as paint.net or pinta.

Oded
Unfortunately you will have to find an older version of Paint.NET, because the newest version is closed-source.
Timwi
@Timwi - I wasn't aware they went closed source :(
Oded
Thanks Oded, I did want to give Paint.NET a go, but noticed it was closed source, pinta might be worth a shot though! I'll take a look.
RodH257
looks like Pinta uses a bit of the Paint.NET Source anyway, looks like it will work well, thanks! :)
RodH257
+1  A: 

Check out Alglib, especially the open source C# edition. It will contain a lot of matrix and array manipulations that will be nicely suitable for TPL.

Paul Sasik
A: 

Project Bouncycastle implements several encryption algorithms in C# and java. Perhaps some of them are not as parallelized as they could be.

Justin Dearing
+1  A: 

I don't know the scope of this project (if it's just a weekly assignment or your final project), but a process that benefits from parallelization does not have to be "embarassingly parallel" as Hans' linked article describes. A problem will benefit from being parallelized if:

  1. The solution to the problem can be expressed as the "sum" of a repetitive series of smaller operations,
  2. The smaller operations have minimal effect on each other, and
  3. The scale of the problem is sufficient to make the benefits of parallelization greater than the loss due to the added overhead of creating and supervising multiple worker processes.

Examples of problems that are often solved linearly, but can benefit from parallelization include:

  • Sorting. Some algorithms like MergeSort are atomic enough to parallelize; others like QuickSort are not.
  • Searching. BinarySearch cannot be parallelized, but if you're searching unordered data like a document for one or more occurrences of words, linear searches can use "divide and conquer" optimizations.
  • Data transformation workflows. Open a file, read its raw data, carve it up into domain fields, turn those domain fields into true domain objects, validate them, and persist them. Each data file is often totally independent of all others, and the process of transformation (which is everything between reading the file and persisting it) is often a bottleneck that benefits from having more processors thrown at it.
  • Constraint satisfaction problems. Given a series of business rules defining relationships and constraints of multiple variables in a problem space, find a set of those variables that meets all constraints, or determine that there are none. Common applications include transportation route scheduling and business process optimization. This is an evolving sector of computational algorithms, of relatively high academic interest, and so you may find published public-domain code of a basic CSP algorithm you can multithread. It may be described as embarassingly parallel as the best-known solution is "intelligent brute force", but nonetheless, a possible solution can be evaluated independently of others and so each can be given to a worker thread.

Processes defined as "embarassingly parallel" are generally any problems sufficiently large in scale, yet atomic and repetitive, that parallel processing is the only feasible solution. The Wiki article Hans links to mentions common applications; they usually boil down, in general, to the application of a relatively simple computation to each element of a very large domain of data.

KeithS