views:

48

answers:

2

I am using Intel's FORTRAN compiler to compile a numerical library. The test case provided errors out within libc.so.6. When I attach Intel's debugger (IDB) the application runs through successfully. How do I debug a bug where the debugger prevents the bug? Note that the same bug arose with gfortran.

I am working within OpenSUSE 11.2 x64.

The error is:

forrtl: severe (408): fort: (3): Subscript #1 of the array B has value -534829264 which is less than the lower bound of 1

+2  A: 

The error message is pretty clear to me, you are attempting to access a non-existent element of an array. I suspect that the value -534829264 is either junk when you use an uninitialised variable to identify the element in the array, or the result of an integer arithmetic overflow. Either way you should switch on the compilation flag to force array bounds checking and run some tests. I think the flag for the Intel compiler would be -CB, but check the documentation.

As to why the program apparently runs successfully in the debugger I cannot help much, but perhaps the debugger imposes some default values on variables that the run time system itself doesn't. Or some other factor entirely is responsible.

EDIT:

Doesn't the run-time system tell you what line of code causes the problem ? Some more things to try to diagnose the problem. Use the compiler to warn you of

  • use of variables before they are initialised;
  • integer arithmetic overflow (not sure if the compiler can spot this ?);
  • any forced conversions from one type to another and from one kind to another within the same type.

Also, check that the default integer size is what you expect it to be and, more important, what the rest of the code expects it to be.

High Performance Mark
Thank you. Any tips to finding where the call is coming from? I'm afraid I have been spoiled by Visual Studio.
ccook
Thank you for the tips, however this is within a released library (FISHPACK). The errors I get have no source information. I get something like: tblktri 00000000004A6AED Unknown Unknown Unknowntblktri 00000000004A55F5 Unknown Unknown Unknownlibc.so.6 00007FAD079D6A7D Unknown Unknown Unknown
ccook
If you have reason to believe that Fishpack is well-used and well-debugged, then it's most likely your call to one of the library routines that is causing the error. If you do not have such reason, then you may have to start looking at the Fishpack source files and building your own library from them. Not much more help I can provide at this stage. I see that tblktri is one of the routines implemented in Fishpack, I'd start looking at the source code for that.
High Performance Mark
Thanks Mark. I did just that. I was also using their test case, so all of their code. I stepped through the code and their seems to be an error on the index calculation on one of the methods. I'm not sure if its only presenting itself because I am using a different compiler than they do. I emailed the author with what I found. Thanks for the help!
ccook
Are you using FISHPACK or FISHPACK90? With the Fortran 90 version, the compiler is probably able to check that the number and types of the arguments in your calls is correct -- this might catch a mistake.
M. S. B.
I tried both. I was getting inaccurate results with the F77 version, and the error in the F90 version.
ccook
+1  A: 

Not an expert in the area but couple of things to consider:

1) Is the debugger initialising the variable used as the index to zero first, but the non-debug does not and so the variable starts with a "junk" value (had an old version of Pascal that used to do that).

2) Are you using threading? If so is the debug changing the order of execution so some prep-thread is completing in time.

Swanny
Thank you for the points. Fortunately I am not using threading.
ccook