tags:

views:

128

answers:

6

So I have a C program. And I don't think I can post any code snippets due to complexity issues. But I'll outline my error, because it's weird, and see if anyone can give any insights.

I set a pointer to NULL. If, in the same function where I set the pointer to NULL, I printf() the pointer (with "%p"), I get 0x0, and when I print that same pointer a million miles away at the end of my program, I get 0x0. If I remove the printf() and make absolutely no other changes, then when the pointer is printed later, I get 0x1, and other random variables in my structure have incorrect values as well. I'm compiling it with GCC on -O2, but it has the same behavior if I take off optimization, so that's not hte problem.

This sounds like a Heisenbug, and I have no idea why it's happening, nor how to fix it. Does anyone who has dealt with something like this in the past have advice on how they approached this kind of problem? I know this may sound kind of vague.

EDIT: Somehow, it works now. Thank you, all of you, for your suggestions.

The debugger told me interesting things - that my variable was getting optimized away. So I rewrote the function so it didn't need the intermediate variable, and now it works with and without the printf(). I have a vague idea of what might have been happening, but I need sleep more than I need to know what was happening.

+8  A: 

Are you using multiple threads? I've often found that the act of printing something out can be enough to effectively suppress a race condition (i.e. not remove the bug, just make it harder to spot).

As for how to diagnose/fix it... can you move the second print earlier and earlier until you can see where it's changing?

Do you always see 0x1 later on when you don't have the printf in there?

One way of avoiding the delay/synchronization of printf would be to copy the pointer value into another variable at the location of the first printf and then print out that value later on - so you can see what the value was at that point, but in a less time-critical spot. Of course, as you've got odd value "corruption" going on, that may not be as reliable as it sounds...

EDIT: The fact that you're always seeing 0x1 is encouraging. It should make it easier to track down. Not being multithreaded does make it slightly harder to explain, admittedly.

I wonder whether it's something to do with the extra printf call making a difference to the size of stack. What happens if you print the value of a different variable in the same place as the first printf call was?

EDIT: Okay, let's take the stack idea a bit further. Can you create another function with the same sort of signature as printf and with enough code to avoid it being inlined, but which doesn't actually print anything? Call that instead of printf, and see what happens. I suspect you'll still be okay.

Basically I suspect you're screwing with your stack memory somewhere, e.g. by writing past the end of an array on the stack; changing how the stack is used by calling a function may be disguising it.

Jon Skeet
Only one thread. I'll see what I can do about moving the second print around. Also, I only know you by reputation, but DAMN! That was fast! I thought they were exaggerating about that, at least a little, but no.
Chris Lutz
And yes, I always see `0x1`. It's never another pointer value.
Chris Lutz
Jon Skeet answers you before you even start thinking about the question
Stefano Borini
@Chris: I'll keep adding bits to the answer as I think of them, always at the bottom. We'll get there...
Jon Skeet
Printing a different value has the same effect - success.
Chris Lutz
+1  A: 

stack corruption due to some overflow ?

Stefano Borini
+4  A: 

If you're running on a processor that supports hardware data breakpoints (like x86), just set a breakpoint on writes to the pointer.

Michael Burr
A: 

Have you tried setting a condition in your debugger which notifies you when that value is modified? Or running it through Valgrind? These are the two major things that I would try, especially Valgrind if you're using Linux. There's no better way to figure out memory errors.

I'm on OS X, but I do need to get Valgrind sometime soon.
Chris Lutz
+1  A: 

Do you have a debugger available to you? If so, what do the values look like in that? Can you set any kind of memory/hardware breakpoint on the value? Maybe there's something trampling over the memory elsewhere, and the printf moves things around enough to move or hide the bug?

Probably worth looking at the asm to see if there's anything obviously wrong there. Also, if you haven't already, do a full clean rebuild. If the definition of the struct has changed recently, there's a vague change that the compiler could be getting it wrong if the dependency checking failed to correctly rebuild everything it needed to.

James Sutherland
This has been an informative one: I set a breakpoint at the place where the `printf()` would have been, and according to my debugger, all these values are getting optimized away by the compiler. I suppose `printf()` would do a good job of keeping them from being optimized away, so that explains that.
Chris Lutz
But what about the value of the pointer? As you're printing that out as well, that can't be optimized away. Why do you have variables which don't have any use anyway, out of interest?
Jon Skeet
They're part of a linked list. I make a node, and set `node->next` to `NULL`, and then add the node to the end of the current list. They may have use in the future, but are just being set to `NULL` in the current function.
Chris Lutz
If they're getting optimized away, what's being printed in the last printf()? Is the pointer that's being printed really the same variable when you have 2 printf() calls?
Michael Burr
I think you should post at least a bit of sample code - otherwise we're all just playing a 20-questions guessing game.
Michael Burr
I'm accepting this answer because it's what pushed me into seeing what was wrong - I was setting parts of a struct that was being optimized away, so my writes didn't exist - but thank you all for your expedient help.
Chris Lutz
@Chris: If that optimization was changing an observed value later, that sounds very odd...
Jon Skeet
A: 

Without code, it's a little hard to help, but I understand why you don't want to foist copious amounts on us.

Here's my first suggestion: use a debugger and set a watchpoint on that pointer location.

If that's not possible, or the bug disappears again, here's my second suggestion.

1/ Start with the buggy code, the one where you print the pointer value and you see 0x1.

2/ Insert another printf a little way back from there (in terms of code execution path).

3/ If it's still 0x1, go back to step 2, moving a little back through the execution path each time.

4/ If it's 0x0, you know where the problem lies.

If there's nothing obvious between the 0x0 printf and the 0x1 printf, it's likely to be corruption of some sort. Without a watchpoint, that'll be hard to track down - you need to check every single stack variable to ensure there's no possibility of overrun.

I'm assuming that pointer is a global since you set it and print it "a million miles away". If it is, lok at the variables you define on either side of it (in the source). They're the ones most likely to be causing overrun.

Another possibility is to turn off the optimization to see if the problem still occurs. We've occasionally had to ship code like that in cases where we couldn't fix the bug before deadlines (we'll always go back and fix it later, of course).

paxdiablo