views:

83

answers:

2

Hello. Sorry if this has been asked before, I haven't been able to find just what I am looking for.

I am reading fields from a list and writing them to a block of memory. I could

  • Walk the whole list, find the total needed size, do one malloc and THEN walk the list again and copy each field;
  • Walk the whole list and realloc the block of memory as I write the values;

Right now the first seems the most efficient to me (smallest number of calls). What are the pros and cons of either approach ?

Thank you for your time.

+1  A: 

You're probably better off allocating a decent amount of space initially, based on what you think is the most likely maximum.

Then, if you find you need more space, don't just allocate enough for the extra, allocate a big chunk extra.

This will minimise the number of re-allocations while still only processing the list once.

By way of example, initially allocate 100K. If you then find you need more, re-allocate to 200K, even if you only need 101K.

paxdiablo
I would recommend looking at Alexandrescu's small object allocator for a similar idea and a very good implementation.
taspeotis
@pax: ah, the "what factor to exponentially re-allocate by" argument. Happy days. I've seen 2, I've seen 3/2, I've seen 577/408...
Steve Jessop
I've seen the suggestion to double it each time but I generally just add the initial amount (so that toal alloc is 100K, 200K, 300K, ...). This allows for a fairly efficient expansion but you're never wasting more than that initial 100K amount.
paxdiablo
@paxdiablo: if you don't know anything about the final size, I'd go with exponential growth and a final `realloc()` to the correct size; linear growth only makes sense if you know beforehand that you'll only need a few expansions for most inputs
Christoph
@Christoph: you shouldn't expect that final `realloc` to release any memory.
Steve Jessop
@Steve: sure, but if the amount of memory which was unnecessarily allocated is large enough, reasonable `realloc()` implementations will (eg on linux via using `mremap()`); see http://blog.kowalczyk.info/article/realloc-on-Windows-vs-Linux-1.html for anecdotal evidence on linear vs. exponential growth performance
Christoph
+1  A: 

The first approach is almost always better. A realloc() typically works by copying the entire contents of a memory block into the freshly allocated, larger block. So n reallocs can mean n copies, each one larger than the last. (If you are adding m bytes to your allocation each time, then the first realloc has to copy m bytes, the next one 2m, the next 3m, ...).

The pedantic answer would be that the internal performance implications of realloc() are implementation specific, not explicitly defined by the standard, in some implementation it could work by magic fairies that move the bytes around instantly, etc etc etc - but in any realistic implementation, realloc() means a copy.

Crashworks
"magic fairies that move the bytes around instantly" - "fairies" a.k.a. "the VMM". It's absurdly heavy to allocate at least a page for every allocation, just to optimize realloc, but I suppose you could do it for large allocations without a huge amount of magic required. I think you sometimes see a trick a bit like it in I/O stacks, where buffers are "handed off" from kernel to user space without a copy by re-mapping it.
Steve Jessop
Unless your malloc code is written by a mental deficient :-), you'll generally only get a copy if you can't expand the current block. That's not usually the case if the block you're trying to re-alloc is at the end of the arena. Again, you're right that it's an implementation issue but I've never seen an implementation that doesn't take advantage of the properties of the underlying data structures. If the re-alloc is the only thing you're doing, you most likely won't get a lot of copy operations.
paxdiablo
Alarmingly, this means the realloc() provided by a certain major manufacturer's standard library must have been written by a mental deficient. That's a disturbing thought.
Crashworks
Which one? Please, let it be Microsoft, I do love a bit of MS bashing occasionally :-) Seriously though, there may be mitigating reasons for a particular implementation to prefer 'copy' to 'coalesce' so it would depend on the circumstances. But, if there's a big enough free block immediately following the one you're trying to realloc, and your arena implementation supports it, it's incredibly unlikely that a copy would be faster than a simple coalesce of the two blocks.
paxdiablo
You can also look the code of `realloc` on Solaris, you will see that it will copy the block and does not use vmm tricks, which is logical as most implementations of `malloc` et al. are in user space, vmm tricks would imply switching to kernel mode. The copying is often avoided by overallocating the blocks in the first place. For small blocks it allocates 100% more, for big blocks 50%.
tristopia