views:

742

answers:

8
#include <iostream>
#include <vector>
using namespace std;

int main()
{
    vector< vector<int> > dp(50000, vector<int>(4, -1));
    cout << dp.size();
}

This tiny program takes a split second to execute when simply run from the command line. But when run in a debugger, it takes over 8 seconds. Pausing the debugger reveals that it is in the middle of destroying all those vectors. WTF?

Note - Visual Studio 2008 SP1, Core 2 Duo 6700 CPU with 2GB of RAM.

Added: To clarify, no, I'm not confusing Debug and Release builds. These results are on one and the same .exe, without even any recompiling inbetween. In fact, switching between Debug and Release builds changes nothing.

+3  A: 

Running a program with the debugger attached is always slower than without.

This must be caused by VS hooking into the new/delete calls and doing more checking when attached - or the runtime library uses IsDebuggerPresent API and does things different in that case.

You can easily try this from inside Visual Studio, start the program with Debug->Start Debugging or Debug->Start Without Debugging. Without debugging is like from command line, with exactly the same build configuration and executable.

Timbo
No doubt. But I've never seen a slowdown of this magnitude!
Vilx-
No, actually this is not what's causing this. Running a program with a debugger attached does not necessarily make it slower. A debugger is simply a process waiting for another process to throw an exception or hit a breakpoint.
Dave Van den Eynde
Try the sample program yourself, Start it with F5 or Ctrl+F5. It is blazing fast without debugger attached...
Timbo
Exactly. But why?
Vilx-
This also happens when I make my own debugger which does not hook anywhere. Actually, the problem came from this original question:http://stackoverflow.com/questions/531893/problems-with-running-an-application-under-controlled-environment-win32
Vilx-
A: 

Yeah, WTF indeed.

You know your compiler will optimize a lot of those function calls by inlining them, and then further optimize the code there to exclude anything that isn't actually doing anything, which in the case of vectors of int will mean: pretty much not a lot.

In debug mode, inlining is not turned on because that would make debugging awful.

This is a nice example of how fast C++ code can really be.

Dave Van den Eynde
You miss the point. The same .exe, without recompiling, has the difference in speed.
Vilx-
Okay, well then my anser is wrong.
Dave Van den Eynde
+16  A: 

Running in the debugger changes the memory allocation library used to one that does a lot more checking. A program that does nothing but memory allocation and de-allocation is going to suffer much more than a "normal" program.

Edit Having just tried running your program under VS I get a call stack that looks like

ntdll.dll!_RtlpValidateHeapEntry@12()  + 0x117 bytes    
ntdll.dll!_RtlDebugFreeHeap@12()  + 0x97 bytes  
ntdll.dll!_RtlFreeHeapSlowly@12()  + 0x228bf bytes  
ntdll.dll!_RtlFreeHeap@12()  + 0x17646 bytes    
msvcr90d.dll!_free_base(void * pBlock=0x0061f6e8)  Line 109 + 0x13 bytes
msvcr90d.dll!_free_dbg_nolock(void * pUserData=0x0061f708, int nBlockUse=1)
msvcr90d.dll!_free_dbg(void * pUserData=0x0061f708, int nBlockUse=1) 
msvcr90d.dll!operator delete(void * pUserData=0x0061f708)
desc.exe!std::allocator<int>::deallocate(int * _Ptr=0x0061f708, unsigned int __formal=4)
desc.exe!std::vector<int,std::allocator<int> >::_Tidy()  Line 1134  C++

Which shows the debug functions in ntdll.dll and the C runtime being used.

Ian G
But I don't recompile the program inbetween! It's the same .EXE!
Vilx-
You don't have to recompile the program if the memory allocation is in a DLL or .so.
Paul Tomblin
As Paul says the memory allocation in the a dll, so recompiling or not doesn't matter (unless you've statically linked everything - even then it may use the IsDebuggerPresent call if you've built against debug libraries, I don't know I've never needed to go that deep).
Ian G
The memory deallocation hit you're noticing is actually at OS level. Even if you linked the CRT statically, HeapFree() is still a Kernel32.DLL function
MSalters
Are you certain that this is the function that does the slowdown in the presence of a debugger?
Vilx-
Yes it's definitely this HeapFree function. You can see the same effect in the program here: http://stackoverflow.com/questions/532092/weird-behaviour-of-c-destructors/532315#532315
Jim T
OK, that explains it then. :) Btw - I like the function "RtlFreeHeapSlowly". LOL. :D
Vilx-
Hehe. Any way to turn it off?
Dave Van den Eynde
A: 

makes no sense to me - attaching a debugger to a random binary in a normal configuration should mostly just trap breakpoint interrupts (asm int 3, etc).

Dustin Getz
Compile it and see for yourself! :)
Vilx-
Or swap in libraries to collect more debugger information.
David Thornley
+2  A: 

It's definitely HeapFree that's slowing this down, you can get the same effect with the program below.

Passing parameters like HEAP_NO_SERIALIZE to HeapFree doesn't help either.

#include "stdafx.h"
#include <iostream>
#include <windows.h>

using namespace std;


int _tmain(int argc, _TCHAR* argv[])
{
HANDLE heap = HeapCreate(0, 0, 0);

void** pointers = new void*[50000];

int i = 0;
for (i = 0; i < 50000; ++i)
{
 pointers[i] = HeapAlloc(heap, 0, 4 * sizeof(int));
}

cout << i;
for (i = 49999; i >= 0; --i)
{
 HeapFree(heap, 0, pointers[i]);
}

cout << "!";

delete [] pointers;

HeapDestroy(heap);
}
Jim T
A: 

8 seconds?? I tried the same in Debug mode. Not more than half a second I guess. Are you sure it's the destructors?

FYI. Visual Studio 2008 SP1, Core 2 Duo 6700 CPU with 2GB of RAM.

+2  A: 

The debug heap automatically gets enabled when you start your program in the debugger, as opposed to attaching to an already-running program with the debugger.

The book Advanced Windows Debugging by Mario Hewardt and Daniel Pravat has some decent information about the Windows heap, and it turns out that the chapter on heaps is up on the web site as a sample chapter.

Page 281 has a sidebar about "Attaching Versus Starting the Process Under the Debugger":

When starting the process under the debugger, the heap manager modifies all requests to create new heaps and change the heap creation flags to enable debug-friendly heaps (unless the _NO_DEBUG_HEAP environment variable is set to 1). In comparison, attaching to an already-running process, the heaps in the process have already been created using default heap creation flags and will not have the debug-friendly flags set (unless explicitly set by the application).

(Also: a semi-related question, where I posted part of this answer before.)

bk1e
+1  A: 

http://www.symantec.com/connect/articles/windows-anti-debug-reference

read sections 2 "PEB!NtGlobalFlags" and 2 "Heap flags"

think this may explain it ...


EDIT: added solution

in your handler for CREATE_PROCESS_DEBUG_EVENT, add the following

// hack 'Load Configuration Directory' in exe header to point to a new block that specfies GlobalFlags 
IMAGE_DOS_HEADER dos_header;
ReadProcessMemory(cpdi.hProcess,cpdi.lpBaseOfImage,&dos_header,sizeof(IMAGE_DOS_HEADER),NULL);
IMAGE_OPTIONAL_HEADER32 pe_header;
ReadProcessMemory(cpdi.hProcess,(BYTE*)cpdi.lpBaseOfImage+dos_header.e_lfanew+4+sizeof(IMAGE_FILE_HEADER),&pe_header,offsetof(IMAGE_OPTIONAL_HEADER32,DataDirectory),NULL);
IMAGE_LOAD_CONFIG_DIRECTORY32 ilcd;
ZeroMemory(&ilcd,sizeof(ilcd));
ilcd.Size = 64; // not sizeof(ilcd), as 2000/XP didn't have SEHandler
ilcd.GlobalFlagsClear = 0xffffffff; // clear all flags.  this is as we don't want dbg heap
BYTE *p = (BYTE *)VirtualAllocEx(cpdi.hProcess,NULL,ilcd.Size,MEM_COMMIT|MEM_RESERVE,PAGE_READWRITE);
WriteProcessMemory(cpdi.hProcess,p,&ilcd,ilcd.Size,NULL);
BYTE *dde = (BYTE*)cpdi.lpBaseOfImage+dos_header.e_lfanew+4+sizeof(IMAGE_FILE_HEADER)+offsetof(IMAGE_OPTIONAL_HEADER32,DataDirectory)+sizeof(IMAGE_DATA_DIRECTORY)*IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG;
IMAGE_DATA_DIRECTORY temp;
temp.VirtualAddress = p-cpdi.lpBaseOfImage;
temp.Size = ilcd.Size;
DWORD oldprotect;
VirtualProtectEx(cpdi.hProcess,dde,sizeof(temp),PAGE_READWRITE,&oldprotect);
WriteProcessMemory(cpdi.hProcess,dde,&temp,sizeof(temp),NULL);
VirtualProtectEx(cpdi.hProcess,dde,sizeof(temp),oldprotect,&oldprotect);
steelbytes