views:

602

answers:

7

Hi,

I have quite massive program (>10k lines of C++ code). It works perfectly in debug mode or in release mode when launched from within Visual Studio, but the release mode binary usually crashes when launched manually from the command line (not always!!!).

The line with delete causes the crash:

bool Save(const short* data, unsigned int width, unsigned int height, 
          const wstring* implicit_path, const wstring* name = NULL, 
          bool enable_overlay = false)
{
    char* buf = new char[17];
    delete [] buf;
}

EDIT: Upon request expanded the example.

The "len" has length 16 in my test case. It doesn't matter, if I do something with the buf or not, it crashes on the delete.

EDIT: The application works fine without the delete [] line, but I suppose it leaks memory then (since the block is never unallocated). The buf in never used after the delete line. It also seems it does not crash with any other type than char. Now I am really confused.

The crash message is very unspecific (typical Windows "xyz.exe has stopped working"). When I click the "Debug the program" option, it enters VS, where the error is specified to be "Access violation writing location xxxxxxxx". It is unable to locate the place of the error though "No symbols were loaded for any stack frame".

I guess it is some pretty serious case of heap corruption, but how to debug this? What should I look for?

Thanks for help.

+3  A: 

The biggest difference between launched in debugger and launched on its own is that when an application is lunched from the debugger Windows provides a "debug heap", that is filled with the 0xBAADF00D pattern; note that this is not the debug heap provided by the CRT, which instead is filled with the 0xCD pattern (IIRC).

Here is one of the few mentions that Microsoft makes about this feature, and here you can find some links about it.

Matteo Italia
Now it crashes inside debugger as well,but it is still unable to "Load symbols for any stack frames", so I am unable to debug it effectively. Thanks, at least some progress.
CommanderZ
Strange, usually it loads the symbols correctly. Try this: launch it without debugging from Visual Studio, then use the "Attach to process" command to connect the VS debugger to your application's process. In this way VS should load correctly the symbols of your application. If the crash happens inside an API call, trace it back to your code using the call stack window; in this case you may get some additional info of what's going on inside the OS installing the Windows debugging symbols.
Matteo Italia
I guess the problem is it is the Release build using /MT, it won't crash with /MTd
CommanderZ
The multi-thread *debug* CRT (/MTd) masks the problem, because, like Windows does with processes spawned by a debugger, it provides to your program a debug heap, that is initialized to the 0xCD pattern. Probably somewhere you use some uninitialized area of memory from the heap as a pointer and you dereference it; with the two debug heaps you get away with it for some reason (maybe because at address 0xbaadf00d and 0xcdcdcdcd there's valid allocated memory), but with the "normal" heap (which is often initialized to 0) you get an access violation, because you dereference a NULL pointer.
Matteo Italia
+5  A: 

have you checked memory leaks elsewhere?

usually weird delete behavior is caused by the heap getting corrupted at one point, then much much later on, it becomes apparent because of another heap usage.

The difference between debug and release can be caused by the way windows allocate the heap in each context. For example in debug, the heap can be very sparse and the corruption doesn't affect anything right away.

Eric
In the end, it was exactly this case. I missed array bounds by ONE in one place and the program was crashing like 5000 lines of code later.
CommanderZ
+1  A: 

There are many possible causes of crashes. It's always difficult to locate them, especially when they differ from debug to release mode.

On the other hand, since you are using C++, you could get away by using a std::string instead of a manually allocated buffer >> there is a reason for which RAII exists ;)

Matthieu M.
I use std wstring everywhere possible, but in this place I need to pass non-unicode char array to one third party function.
CommanderZ
Are you sure that the third-party function does not `delete` in some cases ? Also, `std::string` has a `data()` member function which returns a `char*`.
Matthieu M.
A: 

These two are the first two lines in their function.

If you really mean that the way I interpret it, then the first line is declaring a local variable buf in one function, but the delete is deleting some different buf declared outside the second function.

Maybe you should show the two functions.

Steve Fallows
A: 

Have you tried simply isolating this with the same build file but code based just on what you've put above? Something like:

int main(int argc, char* argv[] )
{
    const int len( 16 );
    char* buf = new char[len + 1]; 

    delete [] buf;
}

The code you've given is absolutely fine and, on it's own, should run with no problems either in debug or optimised. So if the problem isn't down to specifics of your code, then it must be down to specifics of the project (i.e. compilation / linkage)

Have you tried creating a brand new project and placing the 10K+ lines of C++ into it? Might not take too long to prove the point. Especially if the existing project has either been imported in or heavily altered.

Robin Welch
Tried this, works perfectly.
CommanderZ
just a thought but have you tried placing some debug output before and after the delete?It seems from what you say that you've identified the delete as the source of the problem but the error seems unclear about where the error actually happens. It may be that the delete itself is fine but something then attempts to access that memory after the delete.It's also generally good practice to set buf to 0 after deleting it to prevent double delete problems and to make it easy to test if the pointer is valid or not.
Robin Welch
+1  A: 

You probably have a memory overwrite somewhere and the delete[] is simply the first time it causes a problem. But the overwrite itself can be located in a totally different part of your program. The difficulty is finding the overwrite.

Add the following function

#include <malloc.h>

#define CHKHEAP()  (check_heap(__FILE__, __LINE__))

void check_heap(char *file, int line)
{
    static char *lastOkFile = "here";
    static int lastOkLine = 0;
    static int heapOK = 1;

    if (!heapOK) return;

    if (_heapchk() == _HEAPOK)
    {
        lastOkFile = file;
        lastOkLine = line;
       return;
    }

    heapOK = 0;
    printf("Heap corruption detected at %s (%d)\n", file, line);
    printf("Last OK at %s (%d)\n", lastOkFile, lastOkLine);
}

Now call CHKHEAP() frequently throughout your program and run again. It should show you the source file and line where the heap becomes corrupted and where it was OK for the last time.

This returns OK when called just before the crashing line, so it seems the heap is OK.
CommanderZ
A: 

It sounds like you have an unitialised variable somewhere in the code.

In debug mode all the memory is initialised to somthing standard so you will get consistant behavior.

In release mode the memory is not initialised unless you explicitly do somthing.

Run your compiler with the warnings set at the highest level possable.
Then make sure you code compiles with no warnings.

Martin York