views:

742

answers:

3

I have a small single-threaded C++ application, compiled and linked using Visual Studio 2005, that uses boost (crc, program_options, and tokenizer), a smattering of STL, and assorted other system headers.

(It's primary purpose is to read in a .csv and generate a custom binary .dat and a paired .h declaring structures that "explain" the format of the .dat.)

The tool is crashing (access violation on NULL) when run outside the debugger, only in release. E.g. pressing F5 does not cause the tool to crash, Ctrl-F5 does. When I re-attach the debugger, I get this stack:

ntdll.dll!_RtlAllocateHeap@12()  + 0x26916 bytes    
csv2bin.exe!malloc(unsigned int size=0x00000014)  Line 163 + 0x63 bytes C
csv2bin.exe!operator new(unsigned int size=0x00000014)  Line 59 + 0x8 bytes C++
>csv2bin.exe!Record::addField(const char * string=0x0034aac8)  Line 62 + 0x7 bytes  C++
csv2bin.exe!main(int argc=0x00000007, char * * argv=0x00343998)  Line 253   C++
csv2bin.exe!__tmainCRTStartup()  Line 327 + 0x12 bytes  C

The line it's crashing on is a somewhat innocuous-looking allocation:

pField = new NumberField(this, static_cast<NumberFieldInfo*>(pFieldInfo));

...I don't believe it has reached the constructor yet, it's just allocating memory before jumping to the constructor. It has also executed this code dozens of times by the time it crashes, usually in a consistent (but otherwise non-suspicious) location.

The problem goes away when compiling with /MTd or /MDd (debug runtime), and comes back when using /MT or /MD.

The NULL is loaded from the stack, and I can see it in memory view. _RtlAllocateHeap@12 + 0x26916 bytes seems like a huge offset, like an incorrect jump has been made.

I've tried _HAS_ITERATOR_DEBUGGING in a debug build and that hasn't brought up anything suspicious.

Dropping a HeapValidate at the beginning and end of Record::addField shows an OK heap right up to when it crashes.

This used to work -- I'm not entirely sure what changed between now and the last time we compiled the tool (probably years ago, maybe under an older VS). We've tried an older version of boost (1.36 vs 1.38).

Before dropping back to manual investigation of the code or feeding this to PC-Lint and combing through its output, any suggestions on how to effectively debug this?

[I'll be happy to update the question with more info, if you request info in the comments.]

+3  A: 

One little know difference between running with debugger attached or not is the OS Debug Heap. You can turn the debug heap off by using environment variable _NO_DEBUG_HEAP . You can specify this either in your computer properties, or in the Project Settings in Visual Studio.

Once you turn the debug heap off, you should see the same crash even with debugger attached.

That said, be aware memory corruptions can be hard to debug, as often the real cause of the corruption (like some buffer overrun) may be very far from where you see the symptoms (the crash).

Suma
+1: Thanks, didn't know about _NO_DEBUG_HEAP. Trying that now.(I've had the fun experience of tracing down memory corruptions that only occurred on retail embedded hardware without a debugger attached, so I hear you on the "may be very far from symptoms" part.)
leander
Yep, that did it -- got the crash in the debugger. =) Wish me luck...
leander
Hmm - the debug heap's hiding the corruption? Now that's bad luck...
Michael Burr
@Michael: yeah, it was a one-character buffer overflow. I guess the debug heap didn't exhibit it due to different padding...
leander
+2  A: 

Crashing inside new or malloc usually is a hint that the (internal) structure of the malloc implementation has been corrupted. This is most of the time done by writing past a previous allocation (buffer overflow). Then on the next call to new or malloc the app crashes as the internal structure now contains invalid data.

Check if you may overwrite any previous allocated space.

If your application is portable you may try to build it on Linux and run it under Valgrind.

lothar
Yeah, that's my guess too. Time to dig out electricfence or dmalloc, especially now that the _NO_DEBUG_HEAP is allowing me to crash inside the debugger.
leander
Yeah, I was thinking of porting it to linux just for valgrind earlier! =) The memcheck module is great, I've even used it to debug MMORPG servers in the past. Application Verifier seems to cover a lot of the same bases in Windows, fortunately, glad I found that.
leander
A: 

Application Verifier was super-useful for solving this once I had _NO_DEBUG_HEAP=1 in environment, see the accepted answer here: Finding where memory was last freed?

It's probably also worth mentioning pageheap, which I found while looking at Application Verifier. Looks like it covers some similar ground.

(FYI, it was a one-character buffer overflow:

m_pEnumName = (char*)malloc(strlen(data) /* missing +1 here */);
strcpy(m_pEnumName, data);

...yet another ridiculously good argument to not use strcpy directly.)

leander