Windows malloc replacement (e.g., tcmalloc) and dynamic crt linking

A C++ program that uses several DLLs and QT should be equipped with a malloc replacement (like tcmalloc) for performance problems that can be verified to be caused by Windows malloc. With linux, there is no problem, but with windows, there are several approaches, and I find none of them appealing:

1. Put new malloc in lib and make sure to link it first (Other SO-question)

This has the disadvantage, that for example strdup will still use the old malloc and a free may crash the program.

2. Remove malloc from the static libcrt library with lib.exe (Chrome)

This is tested/used(?) for chrome/chromium, but has the disadvantage that it just works with static linking the crt. Static linking has the problem if one system library is linked dynamically against msvcrt there may be mismatches in the heap allocation/deallocation. If I understand it correctly, tcmalloc could be linked dynamically such that there is a common heap for all self-compiled dlls (which is good).

3. Patch crt-source code (firefox)

Firefox's jemalloc apparently patches the windows CRT source code and builds a new crt. This has again the static/dynamic linking problem above.

One could think of using this to generate a dynamic MSVCRT, but I think this is not possible, because the license forbids providing a patched MSVCRT with the same name.

4. Dynamically patching loaded CRT at run time

Some commercial memory allocators can do such magic. tcmalloc can do, too, but this seems rather ugly. It had some issues, but they have been fixed. Currently, with tcmalloc it does not work under 64 bit windows.

Are there better approaches? Any comments?

Q: A C++ program that is split accross several dlls should:

A) replace malloc?

B) ensure that allocation and de-allocation happens in the same dll module?

A: The correct answer is B. A c++ application design that incorporates multiple DLLs SHOULD ensure that a mechanism exists to ensure that things that are allocated on the heap in one dll, are free'd by the same dll module.

Why would you split a c++ program into several dlls anyway? By c++ program I mean that the objects and types you are dealing with are c++ templates, STL objects, classes etc. You CAN'T pass c++ objects accross dll boundries without either lot of very careful design and lots of compiler specific magic, or suffering from massive duplication of object code in the various dlls, and as a result an application that is extremely version sensitive. Any small change to a class definition will force a rebuild of all exe's and dll's, removing at least one of the major benefits of a dll approach to app development.

Either stick to a straight C interface between app and dll's, suffer hell, or just compile the entire c++ app as one exe.

The premise comes from large performance gains when using tcmalloc instead the MSVCRT.

Weidenrinde 2009-05-25 15:56:53

If that's the case, then a C program or even a C++ program that doesn't use Qt could benefit from the change. However, I would disagree that they "should be equipped with a malloc replacement" until profiling indicates that MSVCRT is insufficient.

sean e 2009-05-25 16:44:48

1. The measured, large performance improvements were clearly related to MSVCRT's memory management. The impact depends on the allocation style of the program, thus I do not claim that MSVCRT is bad.2. I just mentioned Qt to indicate that several sub-applications use the QT-lib and therefore static linking is not the preferred option.

Weidenrinde 2009-05-26 13:21:19

In the first sentence of your question, you write that "C++ programs... should be equipped with a malloc replacement" - it only follows that you assume MSVCRT performs poorly. That sentence is written as a prescription: "C++ applications should replace malloc". That's my only issue with the premise - it is highly context dependent whether or not MSVCRT performance is insufficient. Regarding Qt, yes, static link is not preferred but again it was how Qt was presented in the problem that I took issue with.

sean e 2009-05-28 02:35:01

After profiling has shown what allocations are hurting performance, the use of an additional allocator could be limited to the objects/resources that are affected by the performance issue. It may not be the case that allocation of everything, for example QObject derived classes, need wholesale allocator replacement.

sean e 2009-05-28 02:38:55

Read carefully: I said "A program" not "C++ programms". MSVCRT has a special small object treatment of the heap, but from what I understood it has a fixed maximum size of small objects. For the particular program this behaves badly.

Weidenrinde 2009-06-04 08:04:04

For the particular case I found no way to isolate which allocations make the problem. My conjecture is that the general pattern of allocation generates fragmentation and this slows down more or less everything. In the end it seems easier to replace the overall malloc procedure than to manually change the allocator only in many parts.

Weidenrinde 2009-06-04 08:05:07

Well, what was actually written was "A C++ program". The way it is written lead me to believe that was written was prescriptive behavior for all C++ programs. Consider the sentence "A tired person should get some sleep" - does "a tired person" refer to one particular person or any person in general that is tired? I see now, that what you meant was "I have a C++ program that uses several DLLs and QT in which for performance reasons I need to replace malloc". I only took issue because it seemed that you were espousing a premise for all C++ programs.

sean e 2009-06-04 15:26:16

Replacing the system memory allocation strategy is often desirable, due to the performance characteristics of said system memory allocator versus alternatives (such as dlmalloc, which I have used extensively). Clearly, not requiring memory allocation and using only static pools would get around the problem, but if one is inheriting code that does a lot of memory allocation and calls malloc() and friends, performance can usually be increased by going with an alternative. The default may be sufficient for many applications, but it is by no means the performance leader.

dash-tom-bang 2009-09-25 00:18:28

Do you know, how nedmalloc treats the problem?

Weidenrinde 2009-05-26 13:12:33

not sure, though I know it "doesn't automagically replacethe system malloc()" http://www.nedprod.com/programs/portable/nedmalloc/ might have more

rogerdpack 2009-06-18 15:39:56

ansaurus

tags:

views:

answers:

Windows malloc replacement (e.g., tcmalloc) and dynamic crt linking

related questions