views:

226

answers:

1

I have a C++ app that uses large arrays of data, and have noticed while testing that it is running out of memory, while there is still plenty of memory available. I have reduced the code to a sample test case as follows;

void    MemTest()
{
    size_t Size = 500*1024*1024;  // 512mb
    if (Size > _HEAP_MAXREQ)
        TRACE("Invalid Size");
    void * mem = malloc(Size);
    if (mem == NULL)
        TRACE("allocation failed");

}

If I create a new MFC project, include this function, and run it from InitInstance, it works fine in debug mode (memory allocated as expected), yet fails in release mode (malloc returns NULL). Single stepping through release into the C run times, my function gets inlined I get the following

// malloc.c

void * __cdecl _malloc_base (size_t size)

{
        void *res = _nh_malloc_base(size, _newmode);

        RTCCALLBACK(_RTC_Allocate_hook, (res, size, 0));

        return res;
}

Calling _nh_malloc_base

void * __cdecl _nh_malloc_base (size_t size, int nhFlag)
{
        void * pvReturn;

        //  validate size
        if (size > _HEAP_MAXREQ)
            return NULL;
'
'

And (size > _HEAP_MAXREQ) returns true and hence my memory doesn't get allocated. Putting a watch on size comes back with the exptected 512MB, which suggests the program is linking into a different run-time library with a much smaller _HEAP_MAXREQ. Grepping the VC++ folders for _HEAP_MAXREQ shows the expected 0xFFFFFFE0, so I can't figure out what is happening here. Anyone know of any CRT changes or versions that would cause this problem, or am I missing something way more obvious?

Edit: As suggested by Andreas, looking at this under this assembly view shows the following;

--- f:\vs70builds\3077\vc\crtbld\crt\src\malloc.c ------------------------------
_heap_alloc:
0040B0E5  push        0Ch  
0040B0E7  push        4280B0h 
0040B0EC  call        __SEH_prolog (40CFF8h) 
0040B0F1  mov         esi,dword ptr [size] 
0040B0F4  cmp         dword ptr [___active_heap (434660h)],3 
0040B0FB  jne         $L19917+7 (40B12Bh) 
0040B0FD  cmp         esi,dword ptr [___sbh_threshold (43464Ch)] 
0040B103  ja          $L19917+7 (40B12Bh) 
0040B105  push        4    
0040B107  call        _lock (40DE73h) 
0040B10C  pop         ecx  
0040B10D  and         dword ptr [ebp-4],0 
0040B111  push        esi  
0040B112  call        __sbh_alloc_block (40E736h) 
0040B117  pop         ecx  
0040B118  mov         dword ptr [pvReturn],eax 
0040B11B  or          dword ptr [ebp-4],0FFFFFFFFh 
0040B11F  call        $L19916 (40B157h) 
$L19917:
0040B124  mov         eax,dword ptr [pvReturn] 
0040B127  test        eax,eax 
0040B129  jne         $L19917+2Ah (40B14Eh) 
0040B12B  test        esi,esi 
0040B12D  jne         $L19917+0Ch (40B130h) 
0040B12F  inc         esi  
0040B130  cmp         dword ptr [___active_heap (434660h)],1 
0040B137  je          $L19917+1Bh (40B13Fh) 
0040B139  add         esi,0Fh 
0040B13C  and         esi,0FFFFFFF0h 
0040B13F  push        esi  
0040B140  push        0    
0040B142  push        dword ptr [__crtheap (43465Ch)] 
0040B148  call        dword ptr [__imp__HeapAlloc@12 (425144h)] 
0040B14E  call        __SEH_epilog (40D033h) 
0040B153  ret              
$L19914:
0040B154  mov         esi,dword ptr [ebp+8] 
$L19916:
0040B157  push        4    
0040B159  call        _unlock (40DDBEh) 
0040B15E  pop         ecx  
$L19929:
0040B15F  ret              
_nh_malloc:
0040B160  cmp         dword ptr [esp+4],0FFFFFFE0h 
0040B165  ja          _nh_malloc+29h (40B189h) 

With the registers as follows;

EAX = 009C8AF0 EBX = FFFFFFFF ECX = 009C8A88 EDX = 00747365 ESI = 00430F80 EDI = 00430F80 EIP = 0040B160 ESP = 0013FDF4 EBP = 0013FFC0 EFL = 00000206

So the compare does appear to be against the correct constant, i.e. @040B160 cmp dword ptr [esp+4],0FFFFFFE0h, also esp+4 = 0013FDF8 = 1F400000 (my 512mb)

Second edit: Problem was actually in HeapAlloc, as per Andreas' post. Changing to a new seperate heap for large objects, using HeapCreate & HeapAlloc, did not help alleviate the problem, nor did an attempt to use VirtualAlloc with various parameters. Some further experimentation has shown that where allocation one large section of contiguous memory fails, two smaller blocks yielding the same total memory is ok. e.g. where a 300MB malloc fails, 2 x 150MB mallocs work ok. So it looks like I'll need a new array class that can live in a number of biggish memory fragments rather than a single contiguous block. Not a major problem, but I would have expected a bit more out of Win32 in this day and age.

Last edit: The following yielded 1.875GB of space, albeit non-contiguous

#define TenMB 1024*1024*10

void    SmallerAllocs()
{

    size_t Total = 0;
    LPVOID  p[200];
    for (int i = 0; i < 200; i++)
    {
        p[i] = malloc(TenMB);
        if (p[i])
            Total += TenMB; else
            break;
    }
    CString Msg;
    Msg.Format("Allocated %0.3lfGB",Total/(1024.0*1024.0*1024.0));
    AfxMessageBox(Msg,MB_OK);
}
+2  A: 

May it be the cast that the debugger is playing a trick on you in release-mode? Neither single stepping nor the values of variables are reliable in release-mode.

I tried your example in VS2003 in release mode, and when single stepping it does at first look like the code is landing on the return NULL line, but when I continue stepping it eventually continues into HeapAlloc, I would guess that it's this function that's failing, looking at the disassembly if (size > _HEAP_MAXREQ) reveals the following:

00401078  cmp         dword ptr [esp+4],0FFFFFFE0h 

so I don't think it's a problem with _HEAP_MAXREQ.

Andreas Brinck
I tried tracing past the return NULL, and got similar resuls, with the failure happening in my original app in 'return HeapAlloc(_crtheap, 0, size);' in malloc.c, basically showing my test program to be flawed. I'll try another test program and re-post, as the problem still occurs in my main app.
Shane MacLaughlin
@Shane 512 MB is quite a lot of memory, there's certainly a possibility that you don't have this much *contiguous* memory in your process virtual address space. (Allocations may have been placed in a different way in debug mode, explaining why it works there).
Andreas Brinck
I'm coming to that conclusion myself. I tried another test program with 2 mallocs of 512mb each, which worked, whereas my app is trying to allocate 2 300mb blocks and failing. My guess is that this is heap fragmentation related, and I'll have to review how I use the heap in my app. Thanks for all the feedback.
Shane MacLaughlin
@Shane You will never be able to allocate 2300 MB, the limit is ~2 GB (on x86), see: http://blogs.msdn.com/oldnewthing/archive/2004/08/17/215089.aspx#215553
Andreas Brinck
@ Andreas Brinck: I am pretty sure he meant 2 x 300 MB, not 2,300 MB. ;-)
DevSolar
@Devsolar: Yep, 2 x 300mb blocks, my bad typo. Looks like it's time to move from malloc to using a seperate large heap from my bigger blocks of data and going with VirtualAlloc. I'm still baffled that a process that shows up as using only 350mb under task manager can't aquire another 300mb with a total PF usage of 850mb, 4GB limit and 2gb of physical ram. With more than half the physical memory still available, failing to allocate just over a third of that amount is kind of weak whichever way you look at it.
Shane MacLaughlin