Instead of adding the PLinq extension statement AsParallel() manually could not the compiler figure this out for us auto-magically? Are there any examples where you specifically do not want parallelization if the code supports it?
views:
48answers:
3I do research in the area of automatic parallelization. This is a very tricky topic that is the subject of many Ph.D. dissertations. Static code analysis and automatic parallelization has been done with great success for languages such as Fortran, but there are limits to the compiler's ability to analyze inter-regional dependencies. Most people would not be willing to sacrifice the guarantee of code correctness in exchange for potential parallel performance gains, and so the compiler must be quite conservative in where it inserts parallel markers.
Bottom line: yes, a compiler can parallelize code. But a human can often parallelize it better, and having the compiler figure out where to put the markers can be very, very, very tricky. There are dozens of research papers available on the subject, such as the background work for the Mitosis parallelizing compiler or the D-MPI work.
Auto parallelization is trickier than it might initially appear. It's that "if the code supports it" part that gets you. Consider something like
counter = 0
f(i)
{
counter = counter + 1
return i + counter
}
result = map(f, {1,2,3,4})
If the compiler just decides to parallelize map here, you could get different results on each run of the program. Granted, it is obvious that f doesn't actually support being used in this manner because it has a global variable. However if f is in a different assembly the compiler can't know that it is unsafe to parallelize it. It could introspect the assembly f is in at runtime and decide then, but then it becomes a question of "is the introspection fast enough and the dynamic parallelization fast enough to not negate the benefits from doing this?" And sometimes it may not be possible to introspect, f
could be a P/Invoked function, that might actually be perfectly suited to parallelization, but since the runtime/compiler has no way of knowing it has to assume it can't be. This is just the tip of the iceberg.
In short, it is possible, but difficult, so the trade-off between implementing the magic and the benefits of the magic were likely skewed too far in the wrong direction.
The good news is that the compiler researchers say that we are really close to having automatically parallelizing compilers. The bad news is that they have been saying that for fifty years.
The problem with impure languages such as C# is usually that there is not enough parallelism. In an impure language, it is very hard for a human and pretty much impossible for a program to figure out if and how two pieces of code interact with each other or with the environment.
In a pure, referentially transparent, language, you have the opposite problem: everything is parallelizable, but it usually doesn't make sense, because scheduling a thread takes way longer than just simply executing the damn thing.
For example, in a pure, referentially transparent, functional language, if I have something like this:
if a <= b + c
foo
else
bar
end
I could fire up five threads and compute a
, b
, c
, foo
and bar
in parallel, then I compute the +
and then the <=
and lastly, I compute the if
, which simply means throwing away the result of either foo
or bar
, which both have already been computed. (Note how this depends on being functional: in an impure language, you cannot simply compute both branches of an if
and then throw one away. What if both print something? How would you "unprint" that?)
But if a
, b
, c
, foo
and bar
are really cheap, then the overhead of those five threads will be far greater than the computations themselves.