views:

243

answers:

4

My tools are Linux, gcc and pthreads. When my program calls new/delete from several threads, and when there is contention for the heap, 'arena's are created (see the following link for reference http://www.bozemanpass.com/info/linux/malloc/Linux_Heap_Contention.html). My program runs 24x7, and arenas are still occasionally being created after 2 weeks. I think there may eventually be as many arenas as threads. ps(1) shows alarming memory consumption, but I suspect that only a small portion of it is actually mapped.

What is the 'overhead' for an empty arena? (How much more memory per arena is used than if all allocation was confined to the traditional heap? )

Is there any way to force the creation in advance of n arenas? Is there any way to force the destruction of empty arenas?

+1  A: 

struct malloc_state (aka mstate, aka arena descriptor) have size

glibc-2.2 (256+18)*4 bytes =~ 1 KB for 32 bit mode and ~2 KB for 64 bit mode. glibc-2.3 (256+256/32+11+NFASTBINS)*4 =~ 1.1-1.2 KB in 32bit and 2.4-2.5 KB for 64bit

See glibc-x.x.x/malloc/malloc.c file, struct malloc_state

osgx
Don't you have to round it up to the next MMU paging block size?Thanks for the answer!
rleir
It is internal arena descriptor. Each arena descriptor is placed in mmap-ed segment. limit of 65k maximum of mmaps is hardcoded. Each mmap takes some resources from OS kernel (VMA).
osgx
All arena descriptors are in circularly linked list begins from main_arena. Every new arena is placed in begin of mmap-ed region with offset of sizeof(heap_info) = 4xsizeof(void*) = 16 or 32 bytes.The heap (mmaped segment) is aligned and have size from HEAP_MIN_SIZE to HEAP_MAX_SIZE. It have native alignment of mmap's calls (= page = 4k). The rest of heap (after heap_info and mstate) is used for malloc_chunks (malloced data).
osgx
Sorry, HEAP_MIN_SIZE = 32*1024 (32KB) HEAP_MAX_SIZE = 1024*1024 (1MB)
osgx
HEAP_MAX_SIZE = 1MB is the max size of ARENA. So it will be a LOT of arenas in big programme.
osgx
A: 

from malloc.c (glibc 2.3.5) line 1546

/*
  -------------------- Internal data structures --------------------
   All internal state is held in an instance of malloc_state defined
   below. 
 ...
   Beware of lots of tricks that minimize the total bookkeeping space
   requirements. **The result is a little over 1K bytes** (for 4byte
   pointers and size_t.)
*/

The same result as I got for 32-bit mode. The result is a little over 1K bytes

osgx
+1  A: 

Destruction of arenas... I don't know yet, but there is such text (briefly - it says NO to the possibility of destruction/trimming memory ) from analysis http://www.citi.umich.edu/techreports/reports/citi-tr-00-5.pdf from 2000 (*a bit outdated). Please name your glibc version.

Ptmalloc maintains a linked list of subheaps. To re-
duce lock contention, ptmalloc searchs for the first
unlocked subheap and grabs memory from it to fulfill
a malloc() request. If ptmalloc doesn’t find an
unlocked heap, it creates a new one. This is a simple
way to grow the number of subheaps as appropriate
without adding complicated schemes for hashing on
thread or processor ID, or maintaining workload sta-
tistics. However, there is no facility to shrink the sub-
heap list and nothing stops the heap list from growing
without bound. 
osgx
There is a code for heap (aka arena) trimming (heap_trim). But it works only for completely free arena.
osgx
Such "simple way" of growing subheap number will lead to continuous creation of arenas (subheaps). The arena number can grow also because of heap fragmentation.
osgx
A: 

Consider using of TCmalloc form google-perftools. It just better suited for threaded and long-living applications. And it is very FAST. Take a look on http://goog-perftools.sourceforge.net/doc/tcmalloc.html especially on graphics (higher is better). Tcmalloc is twice better than ptmalloc.

osgx
The best answer here!
osgx
Thanks for the idea. Note: the original question is not about speed, I do not need it to be faster.
rleir
High speed is a bonus there :)
osgx