tags:

views:

241

answers:

7

Hi,

I have come across a very odd problem in C that I have never encountered before. I have narrowed it down to the following very simple snippet.

The variables are global and of type:

    int cpd;
    int nPart;

And here is the relevant code snippet which I gradually stripped down to the very minimum needed to produce the issue:

    printf("\ncpd1: %d\n",cpd);

    int p;
    for(p=1;p<=nPart;p++)
    {
        printf("\ncpd2: %d\n",cpd); exit(0);
    }

...The output I get is this:

cpd1: 17

cpd2: 0

How on earth is this possible?! cpd has NOT been reassigned, NO functions have been called... yet it changed? HOW?!?!

This has been driving me slowly insane for quite some time now... ... so any ideas?

thanks for your time, Ben.

EDIT: and when I remove -02 from the makefile arguments to gcc, BOTH the print statements tell me that cpd = 0!

EDIT: Okay, I just found that a variable that is declared globally once, initialised as 4.0, and then never modified is now apparently 1.51086e-311 ... Something is very wrong somewhere...

EDIT: SOLVED!: I had an array of size 1000 that needed to be over 4000, and trying to write to this was corrupting the memory around it. Thing is, this array is NOT accessed anywhere near those print statements, it is accessed in the same function however, much earlier on (large function!). The weird discrepancy between print statements must be some weird artifact of using -O2, as without -O2, both prints of cpd print the corrupted version. Thank you everyone, I wouldnt have worked this out without your help!

A: 

An unintialized variable can have any value, typically in debug build it will be set to zero, but you can't rely on this.

Martin Beckett
Globals have `static` storage class by default. I guess those are auto initialized.
Mehrdad Afshari
But shouldn't the same value be printed both times anyway?
IVlad
The memory that is dedicated to global variables is set to only 0 bits
anthares
@IVlad: we can't tell. `nPart` is 0 so only one `printf()` should have happened.
Alok
+1 for @anthares and @Mehrdad. Globals are automatically initialized to zero.
Carl Norum
The variables are definitely calculated and assigned before this code snippet is executed. I have confirmed this.The variables are global.
Ben
@IVlad: No, it shouldn't. At least in general case. Uninitialized variables are not guaranteed to hold a stable value (this actually is sometimes observed in practice). In this case the variables are global though, so they are guaranteed to be initialized.
AndreyT
@anthares: Memory for global/satic variables is not set to 0 bits. Instead, the variables are guaranteed to be properly zero-initialized. If on some platform null-pointer of some type is physically represented by 0xFFFFAAAA pattern, then the corresponding global/static pointer variables are automatically set to 0xFFFFAAAA pattern.
AndreyT
I agree my mistake ...
anthares
+3  A: 

That which is posted could not do that. The only explanation is that something else is changing cpd, or cpd has multiple instances.

wallyk
I am not a C expert, but I know enough C to know that this should be impossible, yet it definitely is happening... which is why this is so confusing.
Ben
There's no magic here. The computer is doing *exactly* what it understands. The goal is to find where you've led it astray.
wallyk
+3  A: 
int main() {
    int cpd = 13;
    int nPart = 17;

    printf("\ncpd1: %d\n",cpd);

    int p;

    for(p=1;p<=nPart;p++) {
            printf("\ncpd2: %d\n",cpd);
    }

    exit(0);
}

This compiles and runs with expected output for me. Have I incorrectly reproduced your example, or is the lack of a closing brace on the end of your for loop (and inclusion of subsequent exit(0) purposeful?

Edit: assume proper includes.

Xorlev
That's not the same as having the cpd/nPart global - ie. outside the main
Martin Beckett
OP says `cpd` and `nPart` are global.
Carl Norum
Same for me. I tried reproducing this in a stand-alone problem, and the issue doesnt occur. I didn't bother including beyond the brace, because with the exit(0); there it is irrelevant. A closing brace does exist, and I've tried removing everything else in the loop. The issue still occurs.
Ben
+2  A: 
Alok
I made this same program, and there is indeed not an issue. The exit(0) was there to help me narrow down the problem, it is not an intended function of the code! Between 2 print statements there is the code: "int p; for(p=1;p<=nPart;p++) {" and the problem occurs. I just dont see how this is possible!
Ben
+5  A: 

Only possible reason I can think of is that you have another local int cpd variable declared. As an example, I took your code and slightly modified it to add another int cpd declaration and left it uninitialised:
Note I had to set nPart = 1 so the for loop executed at least once

#include <stdio.h>

int cpd;
int nPart = 1;

int main (int argc, char ** argv)
{
 printf("\ncpd1: %d\n",cpd);
 int cpd;


    int p;

    for(p=1;p<=nPart;p++)
    {
        printf("\ncpd2: %d\n",cpd); 
  break;
 }
}

When I ran it, I got the following output:
cpd1: 0

cpd2: 2130567168

As expected, the global variable cpd is 0, the local cpd is uninitialised and can be pretty much any 32 bit value.

zebrabox
+1: I hadn't thought about shadowing.
Alok
I have scoured my program for all instances of cpd, and this is not the case. I also do set the variables to nPart = 40000 and cpd = 17 before this code is reached.
Ben
+1  A: 

If the minimal code snippet you posted does indeed reproduce the problem, then the only explanation here is that your compiler is hopelessly broken and generates meaningless broken code.

However, I strongly suspect that the snippet you posted is not really complete (it is obviously non-compilable) and the problem lies somewhere in the code you omitted.

AndreyT
The snippet is copied exactly as it appears in my code. There indeed is a lot of extra code, but not between those two print statements- that is really all there is. My compiler is gcc from the ubuntu repositories (or does it even come preinstalled?), so I doubt that that is the problem too.
Ben
Well, then the most likely explanation is either someone changing your `cpd` or there's another local `cpd` declared between the prints.
AndreyT
+9  A: 

Stack frame corruption due to buffer overflow is the usual explanation for this. Here's an example:

#include <stdio.h>
#include <string.h>

int main()
{
  int cpd;
  char msg[4];
  cpd = 17;
  printf("%d\n", cpd);
  strcpy(msg, "Oops");
  printf("%d\n", cpd);
  return 0;
}

Output:

17
0

The "msg" string buffer is too short by one character, the string terminator overwrites the value of "cpd".

The best way to find the cause is to use the data breakpoint feature of the debugger. Set a regular breakpoint on the function entry point. Then find the address of the "cpd" variable and a set a byte-size data breakpoint on it. The debugger will stop as soon as the cpd value changes.

Beware that this won't necessarily work in optimized code, the "cpd" value might be stored in a register. Which is another possible explanation why its value is different in separate statements.

Hans Passant
Interesting... but is there any way that this could happen without the strcpy between the print statements; just the integer declaration and start of a for-loop that I have?
Ben
+1. I Agree that a buffer overrun is usually the cause of these"magic" errors.
Per Ekman
Or a pointer write, effectively the same thing. Look for pointer writes to variables declared adjacent to cpd, they are the most likely, and trap in the debugger with break on value change.
Martin
+1 Ooh good one - didn't think of that.
zebrabox
You don't say what the execution environment is, but in systemswithout virtual memory (such as on embedded systems) these thingscan happen if the stack collides with the heap. Then the firstprintf call could overwrite the cpd variable on the heap duringits operation.
Per Ekman
I don't know how this fits in with the above, but when I remove -02 from the makefile arguments to gcc, BOTH the print statements tell me that cpd = 0!
Ben
Works on little-endian; not so well on big-endian machines. You'd need a bigger number than 17 (0x12345678, perhaps), or a longer string than "Oops" to demonstrate on a SPARC or PPC machine. It also assumes a specific (plausible) stack layout.
Jonathan Leffler
My cpu is an AMD Athlon II X4 630, which is x86_64 and little endian (I think). This is my best guess for an answer so far, as I've found that another variable that is used once and never modified has also changed value. Thank you for your repsonses!
Ben
@user272491 - I expanded my post with a debugging tip.
Hans Passant
I <3 you all! I had an array of size 1000 that needed to be over 4000. Thing is, this array is NOT accessed anywhere near those print statements, it *is* accessed in the same function however, much earlier on. The weird discrepancy between print statements must be some weird artifact of using -O2, as without -O2, both prints of cpd print the corrupted version.Thank you everyone, I wouldnt have worked this out without your help!
Ben