views:

190

answers:

3

hello

I am toying with certain caching algorithm, which is challenging somewhat. Basically, it needs to allocate lots of small objects (double arrays, 1 to 256 elements), with objects accessible through mapped value, map[key] = array. time to initialized array may be quite large, generally more than 10 thousand cpu cycles.

By lots I mean around gigabyte in total. objects may need to be popped/pushed as needed, generally in random places, one object at a time. lifetime of an object is generally long, minutes or more, however, object may be subject to allocation/deallocation several times during duration of program.

What would be good strategy to avoid memory fragmentation, while still maintaining reasonable allocate deallocate speed?

I am using C++, so I can use new and malloc. Thanks.

I know there a similar questions on website, http://stackoverflow.com/questions/2156745/efficiently-allocating-many-short-lived-small-objects, are somewhat different, thread safety is not immediate issue for me.

my development platform is Intel Xeon, linux operating system. Ideally I would like to work on PPC linux as well, but it is not the most important for me.

A: 

If you do know the maximum size of you arrays, you may use a custom allocator. You will have to write the allocator class yourself. What it should do is allocate a large chunk of memory at once and cast it to a linked list. Each time an object instance needs to be created, you delete the tail from the list. Each time the object needs to be freed, you add an entry to the list.

EDIT: here's a sample from Bjarne Stroustrup's The C++ Programming Language, 3rd Edition:

class Pool
{
private:
  struct Link
    { Link * next; };

  struct Chunk
  {
    enum {size = 8*1024-16};

    Chunk * next;
    char mem[size];
  };

private:
  Chunk * chunks;
  const unsigned int esize;
  Link * head;

private:
  Pool (const Pool &) { }      // copy protection
  void operator = (Pool &) { } // copy protection

public:
  // sz is the size of elements
  Pool(unsigned int sz)
    : esize(sz < sizeof(Link*) ? sizeof(Link*) : sz),
      head(0), chunks(0)
    { }

  ~Pool()
  {
    Chunk * n = chunks;

    while(n)
    {
      Chunk * p = n;
      n = n->next;
      delete p;
    }
  }


public:

  // allocate one element
  void * alloc()
  {
    if(head == 0)
      grow();

    Link * p = head;
    head = p->next;

    return p;
  }

  // put an element back into the pool
  void free(void * b)
  {
    Link * p = static_cast<Link*>(b);
    p->next = head; //put b back as first element
    head = p;
  }

private:
  // make pool larger
  void grow()
  {
    Chunk* n = new Chunk;
    n->next = chunks;
    chunks = n;

    const int nelem = Chunk::size / esize;
    char * start = n->mem;
    char * last = &start [ (nelem - 1) * esize ];

    for(char * p = start; p < last; p += esize) // assume sizeof(Link) <= esize
      reinterpret_cast<Link>(p)->next = reinterpret_cast<Link *>(p + esize);

    reinterpret_cast<Link *>(last)->next = 0;
    head = reinterpret_cast<Link *>(start);
  }
};
Kerido
This answer is kind of vague, but is sounds like your telling him to reimplement the "free-list" that is already in the OS memory allocator. He will still run into major memory fragmentation slow-downs unless his list is actually a smarter data structure.
A. Levy
@ALevy: this can't fragment because all the chunks are the same size.
Ben Voigt
@ALevy: there will be no fragmentation because I suggest that single-size elements be allocated. The size should be chosen enough to store the array that @aaa mentioned. Regarding speed, it's faster than calling OS built-in allocation routines. It can be even faster if Chunks are the size of a memory page and allocated with page allocation routines as @DanO mentioned. Regarding 'vagueness', too bad you've downvoted.
Kerido
+1  A: 

If you can predict the size of the allocated object ahead of time you will probably be best to go with a linearly allocated chunk of memory and your own custom allocator (as @Kerido suggested). To that I would add that the most efficient method would be to zero and swap for positions within the allocation, and not worry about repartitioning and compacting the array (leave dead space between allocations) so you do not have to deal with index updates and index maintenance.

If you can partition your objects ahead of time (i.e. you know you have non-fixed size elements, but the group easily) divide them into buckets and preallocate chunks of memory into each bucket and swap the the items into the appropriate bucket. If your objects can change size over their lifetime that can get tricky to manage so consider this approach carefully.

GrayWizardx
bucket idea sounds good
aaa
+3  A: 

Create a slotted allocator:

Allocator is created with many pages of memory, each of equal size (512k, 256k, the size should be tuned for your use).

The first time an object asks this allocator for memory it allocates a page. Allocating a page consists of removing it from the free list (no search, all pages are the same size) and setting the size of objects that will be allocated on this page. Typically this size calculated by taking the requested size and rounding it up to the nearest power of 2. Subsequent allocations of the same size just require a little bit of pointer math and incrementing the number of objects on the page.

Fragmentation is prevented because slots are all the same size and can be refilled on subsequent allocations. Efficiency is maintained (in some cases improved) because there is no memheader per allocation (which makes a big difference when the allocations are small, once allocations become large this allocator starts to waste almost 50% of available memory).

Both allocations and deallocations can be performed in constant time (no searches of the free list for correct slots). The only tricky part about a deallocation is that you don't typically want a memheader preceding the allocation, so you have to figure out the page and index in the page yourself... It's saturday and I haven't had my coffee so I don't have any good advice about doing that, it's easy enough to figure out from the deallocated address, though.

Hope this was helpful, Dan O.

Dan O
And Boost implements this in the pool library.
GMan