ansaurus

Question

Controlling FPU behavior in an OpenMP program?

Answer 1

A:

The likelihood is that this has to do with the ordering of the floating point operations. We all rely on our operations being associative and commutative, but the unfortunate truth is that floating point operations aren't commutative so when they are parallelized, the results may vary because the order gets randomized.

Try running your loops backwards and seeing if the result differs.

If you do have per thread needs, OMP provides guarantees about iterations of loops falling on the same threads, i.e. if you loop is from 1 to N on a quad core, iterations 1 to N/4 will be run on the same thread.

-Rick

Rick 2010-02-09 05:45:38

Answer 2

+1 A:

As you pointed out already, double/float operations are not associative/commutative/distribute as real numbers in math. Especially, multiplying/dividing huge number/very small number may lead noticeable precision errors when you change the order of computation.
FPU state is should be thread-specific as the state is represented as a register and register status (=context) are specific to a thread.
It is ambiguous to say that spawned threads inherit the master thread's state because state is not clear in this context. If you means register status, then it is not.
My suggestion is why don't you simply set FPU control word per each thread? For example, before spawning OpenMP thread, i.e., before parallel-for, store the current FPU control word in a global variable by using _status87. Then, put statements that reads the global variable and sets a new value in parallel-for iteration. Since it is read-only on the global variable, you don't worry about any data race.

unsigned int saved_status = _status87();
#pragma omp parallel for (...)
for (int i = 0; i < N; ++i)
{
  _controlfp(saved_status, ...);

  ..
}

minjang 2010-02-09 06:22:26

Answer 3

A:

I've concluded that I do not have a problem. The differences in results are due to the order of calculations, not to the FPU state in different threads (we are not changing precision or rounding modes). As for FPU exception masking being different in the worker threads, that is not a concern because if a worker thread performs an operation that would result in an exception, that result (now NaN or Inf, etc.) will eventually "factor in" to the main thread and the exception will be thrown.

In addition, an exception must be caught in the same OpenMP thread that threw it. This means I only want the master thread to be able to throw exceptions anyhow.

2010-02-09 16:12:05

ansaurus

tags:

views:

answers:

Controlling FPU behavior in an OpenMP program?

related questions