views:

12471

answers:

8

I just finished a test as part of a job interview, and one question stumped me - even using google for reference. I'd like to see what the stackoverflow crew can do with it:

The “memset_16aligned” function requires a 16byte aligned pointer passed to it, or it will crash.

a) How would you allocate 1024 bytes of memory, and align it to a 16 byte boundary?
b) Free the memory after the memset_16aligned has executed.

{

   void *mem;

   void *ptr;

   // answer a) here

   memset_16aligned(ptr, 0, 1024);

   // answer b) here

}
+109  A: 
{
void *mem = malloc(1024+16);
void *ptr = ((char *)mem+16) & ~ 0x0F;
memset_16aligned(ptr, 0, 1024);
free(mem);
}

Edited: Explanation as requested.

The first step is to allocate enough spare space, just in case. Since the memory must be 16-byte aligned (meaning that the leading byte address needs to be a multiple of 16), adding 16 extra bytes guarantees that we have enough space. Somewhere in the first 16 bytes, there is a 16-byte aligned pointer. (Note that malloc() is supposed to return a pointer that is sufficiently well aligned for any purpose. However, the meaning of 'any' is primarily for things like basic types - long, double, long double, long long. When you are doing more specialized things, like playing with graphics systems, they can need more stringent alignment than the rest of the system - hence questions and answers like this.)

The next step is to convert the void pointer to a char pointer; GCC notwithstanding, you are not supposed to do pointer arithmetic on void pointers (and GCC has warning options to tell you when you abuse it). Then add 16 to the start pointer. Suppose malloc() returned you an impossibly badly aligned pointer: 0x800001. Adding the 16 gives 0x800011. Now I want to round down to the 16-byte boundary - so I want to reset the last 4 bits to 0. 0x0F has the last 4 bits set to one; therefore, ~ 0x0F has all bits set to one except the last four. Anding that with 0x800011 gives 0x800010. You can iterate over the other offsets and see that the same arithmetic works.

The last step, free(), is easy: you always, and only, return to free() a value that one of malloc(), calloc() or realloc() returned to you - anything else is a disaster. You correctly provided mem to hold that value - thank you. The free releases it.

Finally, if you know about the internals of your system's malloc package, you could guess that it might well return 16-byte aligned data (or it might be 8-byte aligned). If it was 16-byte aligned, then you'd not need to dink with the values. However, this is dodgy and non-portable -- other malloc packages have different minimum alignments, and therefore assuming one thing when it does something different would lead to core dumps. Within broad limits, this solution is portable.

Someone else mentioned posix_memalign() as another way to get the aligned memory; that isn't available everywhere, but could often be implemented using this as a basis. Note that it was convenient that the alignment was a power of 2; other alignments are messier.

One more comment - this code does not check that the allocation succeeded.

Amendment Windows Programmer pointed out that you can't do bit mask operations on pointers, and, indeed, GCC (3.4.6 and 4.3.1 tested) does complain like that. So, an amended version of the basic code - converted into a main program, follows. I've also taken the liberty of adding just 15 instead of 16, as has been pointed out. I'm using uintptr_t since C99 has been around long enough to be accessible on most platforms. If it wasn't for the use of PRIXPTR in the printf() statements, it would be sufficient to #include <stdint.h> instead of using #include <inttypes.h>.

#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>

static void memset_16aligned(void *space, char byte, size_t nbytes)
{
    assert((nbytes & 0x0F) == 0);
    assert(((uintptr_t)space & 0x0F) == 0);
}

int main(void)
{
    void *mem = malloc(1024+15);
    void *ptr = (void *)(((uintptr_t)mem+15) & ~ 0x0F);
    printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "\n", mem, ptr);
    memset_16aligned(ptr, 0, 1024);
    free(mem);
    return(0);
}

And here is a marginally more generalized version, which will work for sizes which are a power of 2:

#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>

static void memset_16aligned(void *space, char byte, size_t nbytes)
{
    assert((nbytes & 0x0F) == 0);
    assert(((uintptr_t)space & 0x0F) == 0);
}

static void test_mask(size_t align)
{
    uintptr_t mask = ~(uintptr_t)(align - 1);
    void *mem = malloc(1024+align-1);
    void *ptr = (void *)(((uintptr_t)mem+align-1) & mask);
    assert((align & (align - 1)) == 0);
    printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "\n", mem, ptr);
    memset_16aligned(ptr, 0, 1024);
    free(mem);
}

int main(void)
{
    test_mask(16);
    test_mask(32);
    test_mask(64);
    test_mask(128);
    return(0);
}

To convert test_mask() into a general purpose allocation function, the single return value from the allocator would have to encode the release address, as several people have indicated in their answers.

[Added Uri commented: Maybe I am having reading [a] comprehension problem this morning, but if the interview question specifically says: "How would you allocate 1024 bytes of memory" and you clearly allocate more than that. Wouldn't that be an automatic failure from the interviewer?

My response won't fit into a 300-character comment...
It depends, I suppose. I think most people (including me) took the question to mean "How would you allocate a space in which 1024 bytes of data can be stored, and where the base address is a multiple of 16 bytes". If the interviewer really meant how can you allocate 1024 bytes (only) and have it 16-byte aligned, then the options are more limited.

  • Clearly, one possibility is to allocate 1024 bytes and then give that address the 'alignment treatment'; the problem with that approach is that the actual available space is not properly determinate (the usable space is between 1008 and 1024 bytes, but there wasn't a mechanism available to specify which size), which renders it less than useful.
  • Another possibility is that you are expected to write a full memory allocator and ensure that the 1024-byte block you return is appropriately aligned. If that is the case, you probably end up doing an operation fairly similar to what the proposed solution did, but you hide it inside the allocator.

However, if the interviewer expected either of those responses, I'd expect them to recognize that this solution answers a closely related question, and then to reframe their question to point the conversation in the correct direction. (Further, if the interviewer got really stroppy, then I wouldn't want the job; if the answer to an insufficiently precise requirement is shot down in flames without correction, then the interviewer is not someone for whom it is safe to work.)
]

Jonathan Leffler
An explanation would make wonders for this answer
Vinko Vrsalovic
What the mask is doing, is making sure that the lower four bits are 0, which is equivalent with saying that the address is 16-byte aligned (because the address, viewed as an integer, is divisible by 16).
florin
But first he adds 16 because setting the lowest bits to zero is going to drop the memory address which won't be in the range he allocated. Note also that he adds to the size of the allocation to make room for the fact that it will go up some...
Bill K
And i'm rusty with C++, but I don't really trust that ~ 0x0F will properly expand to the size of the pointer. If it doesn't, all hell will break loose because you will mask off the most significant bits of your pointer as well. I could be wrong about that though.
Bill K
BTW '+15' works as well as '+16'...no practical impact in this situation though.
Menkboy
The '+ 15' comments from Menkboy and Greg are correct, but malloc() would almost certainly round that up to 16 anyway. Using +16 is marginally easier to explain. The generalized solution is fiddly, but doable.
Jonathan Leffler
The 'enough bits' question is interesting. I suppose that writing: ~(uintptr_t)0x0F would guaranteee the right number of bits, improving reliability. It requires C99 for that, though, whereas what's there will work with C89 too (and some pre-standard C; the void pointers get dodgy then.)
Jonathan Leffler
My earlier comment (and Menkboy's) are a little incorrect, if you want to only add +15 to the malloc'ed size, you need to special case when malloc returns a 16 byte aligned pointer and not add the +16 to it or you have a buffer overrun by one byte.
Greg Rogers
@Jonathan: size_t instead of uintptr_t should work on all architectures, although as far as I know it isn't guaranteed that sizeof(size_t) == sizeof(void*), as it is with uintptr_t.
Greg Rogers
No need to special case: just add 15 in the alignment step the same as the alloc step.
Steve Jessop
Nice explanation.
mdec
Windows programmer
@Greg: if you allocate +15, you add +15 before masking :)
Jonathan Leffler
@Windows Programmer: Interesting; yes, GCC (3.4.6 and 4.3.1) does complain about the mixture. I'll make an edit to the main answer.
Jonathan Leffler
Maybe I am having reading comprehension problem this morning, but if the interview question specifically says: "How would you allocate 1024 bytes of memory" and you clearly allocate more than that. Wouldn't that be an automatic failure from the interviewer?
Uri
excellent explanation!
Lazer
+13  A: 

You could also try posix_memalign (on POSIX platforms, of course).

florin
And _aligned_malloc on Windows.
Steve Jessop
+5  A: 

Perhaps they would have been satisfied with a knowledge of memalign?

Oops, florin beat me to it. However, if you read the man page I linked to, you'll most likely understand the example supplied by an earlier poster.

Don Wakefield
+15  A: 

Three slightly different answers depending how you look at the question:

1) Good enough for the exact question asked is Jonathan Leffler's solution, except that to round up to 16-aligned, you only need 15 extra bytes, not 16.

A:

/* allocate a buffer with room to add 0-15 bytes to ensure 16-alignment */
void *mem = malloc(1024+15);
ASSERT(mem);
/* round up to multiple of 16: add 15 and then round down by masking */
void *ptr = ((char*)mem+15) & ~ (size_t)0x0F;

B:

free(mem);

2) For a more generic memory allocation function, the caller doesn't want to have to keep track of two pointers (one to use and one to free). So you store a pointer to the 'real' buffer below the aligned buffer.

A:

void *mem = malloc(1024+15+sizeof(void*));
ASSERT(mem);
void *ptr = ((char*)mem+sizeof(void*)+15) & ~ (size_t)0x0F;
((void**)ptr)[-1] = mem;

B:

free(((void**)ptr)[-1]);

Note that unlike (1), where only 15 bytes were added to mem, this code could actually reduce the alignment if your implementation happens to guarantee 32-byte alignment from malloc (unlikely, but in theory a C implementation could have a 32-byte aligned type). That doesn't matter if all you do is call memset_16aligned, but if you use the memory for a struct then it could matter.

I'm not sure off-hand what a good fix is for this (other than to warn the user that the buffer returned is not necessarily suitable for arbitrary structs) since there's no way to determine programatically what the implementation-specific alignment guarantee is. I guess at startup you could allocate two or more 1-byte buffers, and assume that the worst alignment you see is the guaranteed alignment. If you're wrong, you waste memory. Anyone with a better idea, please say so...

[Added: The 'standard' trick is to create a union of 'likely to be maximally aligned types' to determine the requisite alignment. The maximally aligned types are likely to be (in C99) 'long long', 'long double', 'void *', or 'void (*)(void)'; if you include <stdint.h>, you could presumably use 'intmax_t' in place of long long (and, on Power 6 (AIX) machines, intmax_t would give you a 128-bit integer type). The alignment requirements for that union can be determined by embedding it into a struct with a single char followed by the union:

struct alignment
{
    char     c;
    union
    {
        intmax_t      imax;
        long double   ldbl;
        void         *vptr;
        void        (*fptr)(void);
    }        u;
} align_data;
size_t align = (char *)&align_data.u.imax - &align_data.c;

You would then use the larger of the requested alignment (in the example, 16) and the align value calculated above.

On (64-bit) Solaris 10, it appears that the basic alignment for the result from malloc() is a multiple of 32 bytes.
]

In practice, aligned allocators often take a parameter for the alignment rather than it being hardwired. So the user will pass in the size of the struct they care about (or the least power of 2 greater than or equal to that) and all will be well.

3) Use what your platform provides: posix_memalign for POSIX, _aligned_malloc on Windows.

Steve Jessop
I agree - I think the intent of the question is that the code that frees the memory block would have access only to the 'cooked' 16-byte aligned pointer.
Michael Burr
For a general solution - you are right. However, the code template in the question clearly shows both.
Jonathan Leffler
Sure, and in a good interview what happens is that you give your answer, then if the interviewer does want to see my answer, they change the question.
Steve Jessop
+6  A: 

Here's an alternate approach to the 'round up' part. Not the most brilliantly coded solution but it gets the job done, and this type of syntax is a bit easier to remember (plus would work for alignment values that aren't a power of 2). The uintptr_t cast was necessary to appease the compiler; pointer arithmetic isn't very fond of division or multiplication.

void *mem = malloc(1024 + 15);
void *ptr = (void*) ((uintptr_t) mem + 15) / 16 * 16;
memset_16aligned(ptr, 0, 1024);
free(mem);
Andrew
In general, where you have 'unsigned long long', you also have uintptr_t which is explicitly defined to be big enough to hold a data pointer (void *). But your solution does indeed have merits if, for some reason, you needed an alignment that was not a power of 2. Unlikely, but possible.
Jonathan Leffler
A: 

On the 16 vs 15 byte-count padding front, the actual number you need to add to get an alignment of N is max(0,N-M) where M is the natural alignment of the memory allocator (and both are powers of 2).

Since the minimal memory alignment of any allocator is 1 byte, 15=max(0,16-1) is a conservative answer. However, if you know your memory allocator is going to give you 32-bit int aligned addresses (which is fairly common), you could have used 12 as a pad.

This isn't important for this example but it might be important on an embedded system with 12K of RAM where every single int saved counts.

The best way to implement it if you're actually going to try to save every byte possible is as a macro so you can feed it your native memory alignment. Again, this is probably only useful for embedded systems where you need to save every byte.

In the example below, on most systems, the value 1 is just fine for MEMORY_ALLOCATOR_NATIVE_ALIGNMENT, however for our theoretical embedded system with 32-bit aligned allocations, the following could save a tiny bit of precious memory:

#define MEMORY_ALLOCATOR_NATIVE_ALIGNMENT    4
#define ALIGN_PAD2(N,M) (((N)>(M)) ? ((N)-(M)) : 0)
#define ALIGN_PAD(N) ALIGN_PAD2((N), MEMORY_ALLOCATOR_NATIVE_ALIGNMENT)
Adisak
A: 

Unfortunately, in C99 it seems pretty tough to guarantee alignment of any sort in a way which would be portable across any C implementation conforming to C99. Why? Because a pointer is not guaranteed to be the "byte address" one might imagine with a flat memory model. Neither is the representation of 'uintptr_t' so guaranteed, which itself is an option type anyway.

We might know of some implementations which use a representation for 'void *' (and by definition, also 'char *') which is a simple byte address, but by C99 it is opaque to us, the programmers. An implementation might represent a pointer with by a set {segment, offset} where 'offset' could have who-knows-what alignment "in reality." Why, a pointer could even be some form of hash table lookup value, or even a linked-list lookup value. It could encode bounds information.

In a recent C1X draft for a C Standard, we see the '_Alignas' keyword. That might help a bit.

The only guarantee C99 gives us is that the memory allocation functions will return a pointer suitable for assignment to a pointer pointing at any object type. Since we cannot specify the alignment of objects, we cannot implement our own allocation functions with responsibility for alignment in a well-defined, portable manner.

It would be good to be wrong about this claim.

Shao
A: 

usage of memalign, Aligned-Memory-Blocks might be a good solution for the problem.

neuron