views:

66

answers:

2

Hi,

I am looking for papers or tools about parallelism discovering. More explicitly, if you have a sequential source code, how to find sections which could be efficiently parallelized (taking account of data dependencies...).

The speculation could be static or dynamic. The source code to observe should be in C or C++.

I have already some leads:

  • TLS technics
  • Call graph analysis with Valgrind or gprof

What I am looking for is more a way to find the section, then to code the parallel code by hand, not automatically.

Tools such as IntelTBB or OpenMP are not at all what I am looking for.

+1  A: 

I asked a similar question here. It appears to me that parallel thinking is not sufficiently well entrenched in the zeitgeist that a forum like this one is going to yield much in the way of leads. I'm sure there are people on here who know a ton about it but there clearly are not many.

I have gone to back to primary research about the language and architecture constructs which exist to support parallel processing. Knowing how to partition things has been a valuable step in learning what to partition into parallel processing streams.

There's no question in my mind that we will all have to know about this before too long - we can learn about it as our spare processor cores sit idle while we wait for tasks to finish.

Simon
+1  A: 

You have two fundamental approaches.

1) Detemine the data flow dependencies between chunks of code. Find small chunks apparantly-sequential-in-the-source that have no data dependencies. Insert parallelism calls on those chunks, if the parallelism management is estimated to be less than the work the chunks do.

2) When 1) doesn't provide enough, replace the algorithm in use with one which does have identifiable paralellelism.

To do 1), you need full C/C++ front ends that can compute data flow dependencies. You need to be able to estimate the cost of the potentially possible parallel chunks. And you need to able to transform the code to insert the parallelism primitives.

For 2), you need to be able to recognize algorithms (e.g., the classic triply-nested loop for a sequential matrix multiply), and apply transforms to replace it with other code.

Our DMS Software Reengineeering Toolkit is a program transformation engine capable of being configured to do these tasks. It has full C and C++ front ends; the C front end has complete data flow anlaysis and points-to analyzers and thus can compute data dependencies directly. The C++ front presently doesn't have all the flow anlaysis machinery connected, but that machinery is generic and can be connected with some (considerable) effort.

The program transformation aspect of DMS can be use to modify the compiler data structures that are constructed by the front ends, and then those structures can be regenerated back into compilable text.

So, the bits and pieces are avialable. Its still a fair bit of work to implement a specific parallelizer, and there is considerable question as to whether existing C/C++ codes parallelize well if you have one.

Ira Baxter