tags:

views:

178

answers:

3

Today I had a strange encounter with gcc. consider the following code:

float len[ELEM+1];
len[1]=1.0; len[2]=2.0; len[3]=3.0;                                 //length

nod[1][1] = 1;
nod[1][2] = 2;
nod[2][1] = 2;
nod[2][2] = 3;
nod[3][1] = 3;
nod[3][2] = 4;                //CONNECTIVITY


for(i=1;i<nnod;i++)
  for(j=1;j<nfree;j++)
/* blah blah.........*/

And a variation:

float len[ELEM+1];
len[1]=1.0; len[2]=2.0; len[3]=3.0;                                 //length

nod[1][1] = 1;
nod[1][2] = 2;
nod[2][1] = 2;
nod[2][2] = 3;
nod[3][1] = 3;
nod[3][2] = 4;                //CONNECTIVITY

len[1]=1.0; len[2]=2.0;

for(i=1;i<=nnod;i++)
  for(j=1;j<=nfree;j++)
/* blah blah.........*/

The only difference is highlighted in bold.The problem is this: When length is later printed, the first code prints len[1] and len[2] (and uses them in expressions) as 0.0000 while the second code is printing and using the correct values of those variables.

What's wrong? I'm utterly confused.:-o

Note: len is not modified anywhere else.

+8  A: 

You need to show us the definitions for nod. There's a good chance (based on the fact you're starting arrays at 1, not 0) that you're overwriting memory.

For example, if nod is defined as:

int nod[3][2];

the possible array subscripts are 0-2 and 0-1, not 1-3 and 1-2:

nod[0][0]   nod[1][0]   nod[2][0]
nod[0][1]   nod[1][1]   nod[2][1]

If that is the case, you're memory is almost certainly being over-written, in which case all bets are off. You could be corrupting any other piece of data.

If len is placed in memory immediately following nod, this memory overflow would explain why it's being changed. The following diagram will (attempt to) illustrate this. Let's say your nod definition is:

int nod[3][2];

but you attempt to set nod[1-3][1-2] instead of nod[0-2][0-1]:

      +-----------+
+0000 | nod[0][0] |
      +-----------+
+0004 | nod[0][1] |
      +-----------+
+0008 | nod[1][0] |
      +-----------+
+000c | nod[1][1] |
      +-----------+
+0010 | nod[2][0] |
      +-----------+
+0014 | nod[2][1] |
      +-----------+
+0018 |   len[0]  | and nod[3][0], should you be foolish enough to try :-)
      +-----------+
+001c |   len[1]  | and nod[3][1] *
      +-----------+
+0020 |   len[2]  | and nod[3][2] *
      +-----------+

C/C++ will not check regular array bounds for overflow. So, if you attempt to set nod[3][something-or-other], you'll find yourself in trouble very similar to what your question describes.

The bit patterns you're using (3 and 4) equate to IEEE754 single-precision 4.2x10-45 and 5.6x10-45 respectively so they'd certainly give 0.0000 when printing (since you don't appear to be using a format string which would give you the more precise value).

A good way to test this theory would be to output the len variables immediately before and after setting the relevant nod variables, something like:

printf ("before: len1 = %f, len2 = %f\n", len[1], len[2]);
nod[3][1] = 3;
nod[3][2] = 4;
printf ("after : len1 = %f, len2 = %f\n", len[1], len[2]);

Actual details as to how the variables are laid out in memory may be different to that described above but the theory still holds.

Two possible solutions if that turns out to be the problem.

  • Use zero-base arrays as C/C++ intended; or
  • Define them with enough space to handle your unusual use, such as int nod[4][3].
paxdiablo
Agreed. I added 1 for every declaration *(as in float len[ELEM+1];)* to account for start value as 1. But the culprit turned out to be in #define of NODE (nod is float nod[NODE+1][3]) itself. It was one less than intended. :( Thanks for your detailed explanation. :)
Akilan
A: 

You certainly have a buffer overwrite. Check your boundaries, you don't provide any array size in your example, making us unable to say more than that.

If you'using C++, try to replace your arrays by std::tr1::array or boost::array (they are the same), that will give you some clues.

Klaim
A: 

You can try setting a watchpoint in gdb to see when len is being modified. Make sure to compile your program with debug information (-g) and no optimization (-O0). Then, set a breakpoint near the declaration of len, add a watchpoint with watch len[0], and run. gdb will break whenever len[0] is modified, and it will tell you the new and the old value.

Adam Rosenfield