views:

132

answers:

4

How much memory or other resources is used for an individual VirtualAlloc (xxxx, yyy, MEM_RESERVE, zzz)?

Is there any difference in resource consumption (e.g. kernel paged/nonpaged pool) when I allocated one large block, like this:

VirtualAlloc( xxxx, 1024*1024, MEM_RESERVE, PAGE_READWRITE )

or multiple smaller blocks, like this:

VirtualAlloc( xxxx, 64*1024, MEM_RESERVE, PAGE_READWRITE );
VirtualAlloc( xxxx+1*64*1024, 64*1024, MEM_RESERVE, PAGE_READWRITE );
VirtualAlloc( xxxx+2*64*1024, 64*1024, MEM_RESERVE, PAGE_READWRITE );
...
VirtualAlloc( xxxx+15*64*1024, 64*1024, MEM_RESERVE, PAGE_READWRITE );

If someone does not know the answer but can suggest an experiment which would be able to check it, it will be helpful as well.

The motivation is I want to implement returning memory back to OS for TCMalloc under Windows. My idea is to replace individual large VirtualAlloc calls by performing a sequence of small (allocation granularity) calls, so that I can call VirtualFree on each of them. I am aware this way the allocation of large blocks will be slower, but are there any resource consumption penalties be expected?

A: 

In my understanding of the page table, you have chunks for e.g. 1024 pages, with one word per page each. In any case, it's the number of pages, not allocations, that cost. Hwoever, there might be other mechanisms that cost "extra" per allocation (I just don't know).

Still: using VirtualFree you can selectively decommit individual pages or page ranges. For a decommitted page, the virtual address range (within your process) is still reserved, but no physical memory (RAM or swap file) is assigned to it. You can later use VirtualAlloc to commit these pages again.

So unless you need to free up address space for other allocators within your process, you can use this mechanism to selectively request and return memory to the OS.

[edit]

Measuring
For measuring, I though of comparing the performance of both algorithms under one or more typical loads (artificial/random allocation pattern, an allocation-heavy "real world" application, etc.). Advantage: you get the "whole story" - Kernel resources, page fragmentation, application performance etc. Disadvantage: you have to implement both algorithms, you don't know the reason, and you probably need very special cases for a measureable difference that sticks out from the noise.

Address space fragmentation warning - be careful with your return algorithm. When returning individual pages to the process in an "whoever is free" fashion, you might end up with an fragmented address space that has 80% of free memory but no 100K of it consecutive.

peterchen
Thanks, but it is the address space freeing I am concerned about. De-committing is therefore not enough for me.
Suma
Well... you could set up a test. let us know what comes of it.
peterchen
But what to measure and how? How can I measure consumption of kernel internal resources?
Suma
I've updated the answer
peterchen
A: 

You can try to use "perfmon" and add Counters (e.g. Memory) to start getting a feel of what resources are being used by VirtualAlloc. You will have to take a snapshot before and after the call to VirtualAlloc

Another option could be to debug the process making call to VirtualAlloc under WinDBG and use the memory related commands http://windbg.info/doc/1-common-cmds.html#20_memory_heap to get an idea of what is actually happening.

Chubsdad
+1  A: 

Just FYI, you can use GetProcessMemoryInfo and GlobalMemoryStatusEx to get some memory usage measurements.

void DisplayMemoryUsageInformation()
{
    HANDLE hProcess = GetCurrentProcess();
    PROCESS_MEMORY_COUNTERS pmc;
    ZeroMemory(&pmc,sizeof(pmc));
    GetProcessMemoryInfo(hProcess,&pmc, sizeof(pmc));
    std::cout << "PageFaultCount:             " << pmc.PageFaultCount             << std::endl;
    std::cout << "PeakWorkingSetSize:         " << pmc.PeakWorkingSetSize         << std::endl;
    std::cout << "WorkingSetSize:             " << pmc.WorkingSetSize             << std::endl;
    std::cout << "QuotaPeakPagedPoolUsage:    " << pmc.QuotaPeakPagedPoolUsage    << std::endl;
    std::cout << "QuotaPagedPoolUsage:        " << pmc.QuotaPagedPoolUsage        << std::endl;
    std::cout << "QuotaPeakNonPagedPoolUsage: " << pmc.QuotaPeakNonPagedPoolUsage << std::endl;
    std::cout << "QuotaNonPagedPoolUsage:     " << pmc.QuotaNonPagedPoolUsage     << std::endl;
    std::cout << "PagefileUsage:              " << pmc.PagefileUsage              << std::endl;
    std::cout << "PeakPagefileUsage:          " << pmc.PeakPagefileUsage          << std::endl;

    MEMORYSTATUSEX msx;
    ZeroMemory(&msx,sizeof(msx));
    msx.dwLength = sizeof(msx);
    GlobalMemoryStatusEx(&msx);
    std::cout << "MemoryLoad:                 " << msx.dwMemoryLoad               << std::endl;
    std::cout << "TotalPhys:                  " << msx.ullTotalPhys               << std::endl;
    std::cout << "AvailPhys:                  " << msx.ullAvailPhys               << std::endl;
    std::cout << "TotalPageFile:              " << msx.ullTotalPageFile           << std::endl;
    std::cout << "AvailPageFile:              " << msx.ullAvailPageFile           << std::endl;
    std::cout << "TotalVirtual:               " << msx.ullTotalVirtual            << std::endl;
    std::cout << "AvailVirtual:               " << msx.ullAvailVirtual            << std::endl;
    std::cout << "AvailExtendedVirtual:       " << msx.ullAvailExtendedVirtual    << std::endl;
}
Jeff Wilhite
+2  A: 

Zero, or practically Zero, memory is used by making a VirtualAlloc call with the reserve param. This will just reserve the address space within the process. The memory will not be used until you actually back the address with a page by using VirtualAlloc with the commit param. This is essentially the difference between virtual bytes, the amount of address space taken, and private bytes, the amount of committed memory. Both of your uses of VirtualAlloc() will reserve the same amount of memory so they are equivalent from the resource consumption side. I suggest that you do some reading on this before deciding to write your own allocator. One of the best sources for this is Mark Russinivich. You should check his blog. He has written a few entries called pushing the limits which cover some of of this. If you want to get at the real nitty gritty details, then you should read his book (Microsoft Windows Internals). This is by far the best reference that I have read on how windows manages the memory (and everything else).

(Edit) Additional Information: The relevant pieces are the "Page Directory" and the "Page Table". According to my older copy of Microsoft Windows Internals... On x86, there is a single Page Directory for each process with 1024 entries. There are up to 512 page tables. Each 32 bit pointer used in the process is broken into 3 pieces [31-22]Page Directory Index, [21-12] is the Page Table Index, and [11-0] is the byte index in the page. When you use virtual alloc with the reserve param, the Page Directory Entry is created (32 bits), and the Page Table Entry is created 32 bits. At this time, the page is not created for the reserved memory. The best way to see this information is to use the Kernel Debugger. I would suggest using LiveKD (sysinternals). You can use liveKD without attaching a remote computer, but it doesn't allow live debugging. Load LiveKD, and select your process. Then you can run the !PTE command to examine the page table for the process.

Again, I would suggest reading Inside Windows Internals. In my version (4th ed) there is a chapter(over 100 pages) that covers all of this with examples for walking through the various data structures in liveKD.

Mike
The effect of VirtualAlloc on the memory it is reserving / commiting is clear. What I unclear is its overhead. Given the nature of VirtualAlloc, it is clear its overhead is outside of the memory it manages. I was unable to find anything about this overhead so far, even in SysInternals resources. Reserving and Commiting memory is very different in one respect - by reserving you are creating a new region, by committing you are only changing a state of individual page.
Suma
Added additional info on the PTE structure and how to see it in LiveKD.
Mike