tags:

views:

192

answers:

3

It doesn't seem to make sense, unless we just ignore any potential excess space at the beginning of a segment, and then have the first allocated chunk be at the first multiple of 8 (with its corresponding first header being that address -4). This would leave however many bytes before that unused. Is that what's generally done?

edit: thanks to paxdiablo for the detailed explanation below. that all makes sense for 16 byte headers. however, i'm working with a 4 byte header, that looks something like this:

struct mhdr {
    int size;  // size of this block
} tMallocHdr;

now, if my heap starts on an address that is a multiple of 8, and any address returned by malloc needs to be a multiple of 8, and i need to use 4 byte headers, I seem to be forced to 'waste' the first 4 bytes of my heap. for example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
              ^
              (heap starts)

If the heap starts at the carrot above at address 8, using the addressing scheme in this example, the first returnable address i could return to a user following a malloc call would be 16; i need 4 bytes of header, and the first address that's a multiple of 8 which allows 4 bytes of header is 16 (header starts at 12). This means I've wasted the first 4 bytes of my internal heap memory to line things up (8-11).

Is this an acceptable sacrifice, or am I thinking about this wrong?

+5  A: 

Generally, there is wasted space in a block header of an allocated region. Many implementations I've seen use a 16-byte (I'm using the classical definition of byte here, one of eight bits, rather than the ISO C definition) header, immediately before the address returned from malloc, and pad out the allocated region to 16 bytes at the end as well.

This greatly simplifies the algorithms used for allocation and also guarantees that memory addresses returned will be aligned appropriately for the architecture.

But keep in mind that this is an implementation detail, not a feature of the C standard. If there are no alignment requirements and memory allocations are limited to 255 bytes, it's quite plausible that only one byte will be wasted (albeit at the expense of a more complicated algorithm).


It's quite plausible that you could have an embedded application that only ever allocates 256-byte chunks in which case you could have a simple bitmap-based allocator (with the bitmap stored elsewhere rather than in-line with the memory blocks being allocated), wasting only one bit per chunk (we've done this before in low memory environments).

Or maybe you have an allocator in a large address-space and memory environment that gives you 4G no matter what you ask for. Then there's no wastage for a header, but probably plenty for the padding :-)

Or maybe you get a chunk from a size-specific arena (1-64 bytes from arena A, 65-128 from arena B and so on). This means no header required but still allowing variable size (up to a maximum) and substantially less wastage than the 4G solution above.

Bottom line, it depends on the implementation.


In a (reasonably simple) implementation of malloc, you could have a doubly-linked list of "chunks" which are given out by the allocator. These chunks consist of a header and a data section and, to ensure alignment of the data section is correct, the header must be on a 16-byte boundary and be a multiple of 16 bytes in length (that's for a 16-byte alignment requirement - your actual requirement may not be that stringent).

In addition, the chunk itself is padded so it's a multiple of 16 bytes so that the next chunk is suitably aligned as well.

That guarantees that the data section of any chunk, which is the address given to the caller of malloc, is aligned correctly.

So you may well get wastage in that area. A header (with 4-byte integers and pointers) may simply be:

struct mhdr {
    int size;          // size of this block.
    struct mhdr *prev; // prev chunk.
    struct mhdr *next; // next chunk.
    int junk;          // for padding.
} tMallocHdr;

meaning that four of those sixteen bytes will be wasted. Sometimes that's necessary to meet the other requirements (alignment) and, in any case, you can use that padding space for other purposes. As one commenter pointed out, guard bytes can be used to detect some forms of arena corruption:

struct mhdr {
    int guard_bytes;   // set to 0xdeadbeef to detect some corruptions.
    int size;          // size of this block.
    struct mhdr *prev; // prev chunk.
    struct mhdr *next; // next chunk.
} tMallocHdr;

And, while that padding is technically wasted space, it only becomes important if it's a sizeable proportion of the entire arena. If you're allocating 4K blocks of memory, the four bytes of wastage is only about a thousandth of the total size. Actually, the wastage to you as a user is probably the entire 16 bytes of the header since that's memory you can't use, so it's about 0.39% (16 / (4096 + 16)).

That's why a malloc'ed linked list of characters is a very bad idea - you tend to use, for every single character:

  • 16 bytes of header.
  • 1 byte for the character (assuming 8-bit characters).
  • 15 bytes of padding.

This would change your 0.39% figure into 96.9% wastage (31 / (31 + 1)).


And, in response to your further question:

Is this an acceptable sacrifice, or am I thinking about this wrong?

I would say, yes, this is acceptable. Generally, you allocate larger chunks of memory where four bytes won't make a difference in the grand scheme of things. As I stated earlier, it's not a good solution if you allocate lots of little things but that's not the general use case of malloc.

paxdiablo
@paxdiablo: i think i should be more clear. when i use the command to get another page of memory from the OS, i imagine I get an address where it starts. What if that address is a multiple of 8? then i need my 4 byte header, which puts the 'chunk' address to a multiple of 4. This would leave me a 4 byte gap between the next address I can return, and the one which is actually 'free'
hatorade
@paxdiablo: also, why on earth would you need a 16 byte header? everything i've read so far has just needed 4 bytes, to store the size and then some flags
hatorade
I'm not sure I totally understand, @hatorade. When you request memory, any header before the address given back is not yours to play with. You're limited to the addresses between (that returned) and (that plus the size minus 1). Housekeeping information belongs to the memory management functions (malloc/heapalloc). If you're doing something like _implementing_ malloc on top of heapalloc, then you have to follow the alignment rules. If your header is 4 bytes and alignment 8, each malloc block within your heapalloc superblock will need an 8-byte header (4 of which are wasted, yes).
paxdiablo
@paxdiablo: yes sorry i should have been clearer. i want to implement malloc.
hatorade
And you don't _need_ a 16-byte header. As I said, it depends on the implementation. Some may store actual size, maximum size, header checksum, overall checksum, flags, and any other stuff. Some architectures require 16-byte alignment which means the simplest algorithm comes from ensuring your headers and date areas are multiples of that.
paxdiablo
@hatorade: I think part of the problem your having is that you've made an assumption that "the way" that you're thinking `malloc` works is the only possible way it could be implemented. In other words, there is no need to keep a "4 byte header" in-line with the allocations returned, and while that is certainly one way, it is definitely not the only way. One potential reason for avoiding the way you've chosen is any "accidental writes past the allocation" are less likely to corrupt the arena. Some implementations have a mode which places special "guard bytes" to check just for this.
johne
@paxdiablo: well, i haven't made the assumption that's how it works. rather, i've narrowed down that that's how i'm going to do it.
hatorade
@hatorade, that was actually johne's comment rather than mine but I agree with him that there are other ways to do it. If you've chosen the inline accounting information option, there's nothing wrong with that but, unless you use all the bytes in your headers, you may need "wastage" to get correct alignment. If your headers are only required to be 4 bytes but your alignment is 8, there's no way to avoid it with the inline way. It is a _requirement_ of the standard that addresses from malloc are suitably aligned. But see my update as to why this isn't normally an issue.
paxdiablo
@paxdiablo: i guess i've made a mess of this. specifically, i'm looking at http://gee.cs.oswego.edu/dl/html/malloc.html. There are 4byte headers there. i'm guessing he must use 'dead' space at the beginning of the heap's memory to make sure that the first header is -4 from the first address that's a multiple of 8, and allocated the payload as a multiple of 8, to ensure everything lines up.
hatorade
From a cursory glance (it's too much code for an in-depth analysis), it appears that implementation has the (combined) 4-byte size/status at the front and the size at the back, to facilitate forward and backward scanning of chunks. With four-byte ints, that would be eight bytes of overhead and you could start each chunk on an 8-byte boundary plus 4 bytes (0x0004, 0x000c, 0x0014, etc), guaranteeing each data section would be on an 8-byte boundary, with no wastage. But, as I said, I haven't done a deep dive into the code.
paxdiablo
+1  A: 

Some allocators do not have any headers for allocated blocks. Some eliminate headers for small blocks. There is no need for headers for allocated blocks at all, other than for safety. This can be achieved in other ways, e.g., by administrative data completely physically separated from the allocated memory.

Even typed memory allocators can eliminate headers. See for example BIBOP.

Doug Currie
+1  A: 

You'd probably find that if the blocks returned are always aligned on an 8-byte boundary, the implementation uses an 8-byte block to hold any control information. I believe the implementation in K&R (both first and second editions) does this. Indeed, it uses a pointer and a size, so it has 8 bytes of control information. And if it is using 8-byte aligned blocks, there is no real penalty for doing so. (With 64-bit machines, the most stringent alignment might be 16-bytes - it is on SPARC, for instance - and the block overhead is therefore somewhat bigger, but for basically the same reason.)

Jonathan Leffler