views:

541

answers:

1

I'm trying to tracedown a huge slowdown in the heap memory functions in Vista and Windows 7 (I didn't test on any server editions). It doesn't happen on XP at all, only on Microsoft's newer operating systems.

I originally ran into this problem with PHP complied on Windows. The scripts themselves seemed to run at the expected speed, but after script execution I was experiencing 1-2 seconds of delay in the internal PHP shutdown functions. After firing up the debugging I saw that it had to do with the PHP memory manager's use of HeapAlloc/HeapFree/HeapReAlloc.

I traced it down to the use of the flag HEAP_NO_SERIALIZE on the heap functions:

#ifdef ZEND_WIN32
#define ZEND_DO_MALLOC(size) (AG(memory_heap) ? HeapAlloc(AG(memory_heap), HEAP_NO_SERIALIZE, size) : malloc(size))
#define ZEND_DO_FREE(ptr) (AG(memory_heap) ? HeapFree(AG(memory_heap), HEAP_NO_SERIALIZE, ptr) : free(ptr))
#define ZEND_DO_REALLOC(ptr, size) (AG(memory_heap) ? HeapReAlloc(AG(memory_heap), HEAP_NO_SERIALIZE, ptr, size) : realloc(ptr, size))
#else
#define ZEND_DO_MALLOC(size) malloc(size)
#define ZEND_DO_FREE(ptr) free(ptr)
#define ZEND_DO_REALLOC(ptr, size) realloc(ptr, size)
#endif

and (which actually sets the default for HeapAlloc/HeapFree/HeapReAlloc) in the function start_memory_manager:

#ifdef ZEND_WIN32
    AG(memory_heap) = HeapCreate(HEAP_NO_SERIALIZE, 256*1024, 0);
#endif

I removed the HEAP_NO_SERIALIZE parameter (replaced with 0) and it fixed the problem. Scripts now cleanup quickly in both the CLI and the SAPI Apache 2 version. This was for PHP 4.4.9, but the PHP 5 and 6 source code (in development) contains the same flag on the calls.

I'm not sure if what I did was dangerous or not. It's all a part of the PHP memory manager so I'm going to have to do some digging and research, but this brings up the question:

Why are the heap memory function so slow on Vista and Windows 7 with HEAP_NO_SERIALIZE?

While researching this problem I came up with exactly one good hit. Please read this following blog post where the poster explains the issue and offers a test case (both source and binaries available) to highlight the issue:

http://www.brainfarter.net/?p=69

My tests on a Windows 7 x64 quad core 8GB machine gives 43,836. Ouch! The same results without the HEAP_NO_SERIALIZE flag is 655, ~70x faster in my case.

Lastly, it seems that any program created with Visual C++ 6 using malloc/free or new/delete seems to be affected on these newer platforms. The Visual C++ 2008 compiler doesn't set this flag by default for those functions/operators so they aren't affected -- but that still leaves a lot of programs affected!

I encourage you to download the proof of concept and give this a try. This problem explained why my normal PHP on Windows installation was crawling and may explain why Vista and Windows 7 seems slower at times.

UPDATE 1/26/2010: I received a response from Microsoft stating that the LFH is the de facto default policy for heaps that hold any appreciable number of allocations. In Vista they reorganized a lot of code to remove extra data structures and code paths that were no longer part of the common case for handling heap API calls. With the HEAP_NO_SERIALIZE flag and in certain debugging situations they do not allow the use of the LFH and we get stuck on the slow and less optimized path through the heap manager. So... it's highly recommended to not use HEAP_NO_SERIALIZE since you'll miss out on all the work to the LFH and any future work in the Windows heap API.

+5  A: 

First difference I noticed is that Vista uses the Low Fragmentation Heap (LFH) always. XP does not seem to. RtlFreeHeap in Vista is a lot shorter as a result -- all the work is delegated to RtlpLowFragHeapFree. More info regarding LFH and its presence in various OSs. Note the red warning at the top.

More info (Remarks section):

Windows XP, Windows Server 2003, and Windows 2000 with hotfix KB 816542:

A look-aside list is a fast memory allocation mechanism that contains only fixed-sized blocks. Look-aside lists are enabled by default for heaps that support them. Starting with Windows Vista, look-aside lists are not used and the LFH is enabled by default.

Another important piece of information: LFH and NO_SERIALIZE are mutually-exclusive (both cannot be active simultaneously). Combined with

Starting with Windows Vista, look-aside lists are not used

This implies that setting NO_SERIALIZE in Vista disables LFH, but does not (and cannot) fall back to standard look-aside lists (as a fast replacement), according to the above quote. I'm unclear as to what heap allocation strategy Vista uses when NO_SERIALIZE is specified. It looks like it's using something horribly naïve, based on its performance.

Even more info:

Looking at a few stack snapshots of allocspeed.exe it seems to always be in a Ready state (not Running or Wait), and in TryEnterCriticalSection from HeapFree, and pegging the CPU at nearly 100% load for 40 seconds. (On Vista.)

Sample snapshot:

ntdll.dll!RtlInterlockedPushEntrySList+0xe8
ntdll.dll!RtlTryEnterCriticalSection+0x33b
kernel32.dll!HeapFree+0x14
allocspeed.EXE+0x11ad
allocspeed.EXE+0x1e15
kernel32.dll!BaseThreadInitThunk+0x12
ntdll.dll!LdrInitializeThunk+0x4d

Which is strange, because NO_SERIALIZE precisely tells it to skip lock acquisition. Something doesn't add up.

This is a question only Raymond Chen or Mark Russinovich could answer :)

Alex
Also how are you analyzing the ntdll functions?
wojo
`dumpbin /disasm ntdll.dll > dump.txt` then `gvim dump.txt`.
Alex
Looks like the removal of lookaside lists were due to the heap exploits that were possible. See http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Marinescu.pdf and http://blogs.technet.com/srd/archive/2009/08/04/preventing-the-exploitation-of-user-mode-heap-corruption-vulnerabilities.aspx
wojo
Good find re: the removal of lookaside lists.
Alex
Accepted this answer. Excellent research Alex. I don't think we'll get a better answer from anyone but Microsoft themselves.
wojo