views:

53

answers:

2

Hi all,

I have been working with a fast multipole code in Fortran. It is a black box to me, and I have been having some strangeness when compiling it on my mac.

I am using version 11.1 of the compiler, I've got a MacBook Pro running a 2.5 GHz Intel Core 2 Duo on Snow Leopard.

The code runs fine when I set the optimization flag to -O0, but fails when I use -O2 or -O3. What is bizarre is that the code runs fine on a linux box, at least with the default -O2 flag.

Anyone have any ideas on what could be causing the issue? It must be something with vectorization. Any help is greatly appreciated!

Thanks! -Patrick

+1  A: 

At first glance, and without any further information, I jump to the conclusion that your program is unstable; that is, your program produces very different results (failure vs non-failure in some cases) when you tweak the optimisation (which has all sorts of effects on the code that is generated). Some of the tweaks will have an impact on the results of floating-point arithmetic which can easily cause the difference between success and failure for long-running scientific simulations.

This is a symptom of an underlying 'issue' with the program and I would advise you not to rely on the results of 'successful' runs of the program until you understand it a lot better -- you need to prise open the black box and see what's inside.

At the very least you ought test the sensitivity of your program to small changes in inputs.

High Performance Mark
You are probably right. The only caveat is that this is a code that has been thoroughly tested. It computes n-body interactions using the FMM, but the results are compared with direct calculation so its easy to see if you're getting the correct answer. The output on the mac for the potential is NaN with the optimization turned on, but correct when turned off. It works fine on the linux box and Windows...
Patrick
NaN might be an indication that something is wrong with real's precision. Are there any hardcoded real's kinds in the code?
kemiisto
Let's see. The code declares a huge work array of ints, but this is then broken up and passed to various subroutines, where things are declared as real*8's. There are a TON of table look ups, could be something isn't getting copied out of those correctly.
Patrick
@Patrick - try checking out for saves, declare all your types with specific kinds, verify for array out of bounds and then picking out some weird values ... I doubt that changing the level of opt. could be the cause. I never encountered it before as such.
Rook
@Patrick: the Intel Fortran Compiler User Guide should explain what options -O2 engages, rather than being a single compiler option it's a shorthand for a set of options. Figure out what that set is, then use the options individually to see if you can isolate which one is causing the problem, that should point you in the right direction.
High Performance Mark
+1  A: 

As already said, it is possible that the final result is numerically sensitive and optimization, which relaxes the arithmetic rules, is resulting in a numeric instability. Or optimization could be revealing a bug in the program. If the code is doing its own memory management (no longer necessary with Fortran 90/95/2003) with an internal array of ints, something could be going wrong different OS. I would investigate further...

I suggest turning on all warning and checking options. If there is a bug and you are lucky they might reveal it or give a clue. At least it is easy to try. Try these options:

-check all -traceback -warn all -fstack-protector

You could also try "-assume protect_parens", which will make ifort compliant with the Fortran standard, and see if that makes the problem go away.

Or maybe the program is assuming that memory is preallocated to some value. Is that a difference from Linux and Mac ?? I think that ifort has options to control this. If it is an old Fortran 77 code, it may assume that local variables are preserved across procedure calls, even without the use of "save" in the declarations. There is a compiler option to cause all local variables to act as if "save" were used -- see if that makes a difference.

M. S. B.