tags:

views:

2214

answers:

8
+5  Q: 

OSX lacks memalign

I'm working on a project in C and it requires memalign(). Really, posix_memalign() would do as well, but darwin/OSX lacks both of them.

What is a good solution to shoehorn-in memalign? I don't understand the licensing for posix-C code if I were to rip off memalign.c and put it in my project- I don't want any viral-type licensing LGPL-ing my whole project.

+7  A: 

Mac OS X appears to be 16-byte mem aligned.

Quote from the website:

I had a hard time finding a definitive statement on MacOS X memory alignment so I did my own tests. On 10.4/intel, both stack and heap memory is 16 byte aligned. So people porting software can stop looking for memalign() and posix_memalign(). It’s not needed.

ceretullis
SSE guarantees this on x86: Malloc has to return a pointer to memory suitably aligned for any type of object, which on x86 includes SSE vectors (which require 16-byte alignment)
+3  A: 

Should be easy enough to do yourself, no? Something like the following (not tested):

void *aligned_malloc( size_t size, int align )
{
    void *mem = malloc( size + (align-1) + sizeof(void*) );

    char *amem = ((char*)mem) + sizeof(void*);
    amem += align - ((uintptr)amem & (align - 1));

    ((void**)amem)[-1] = mem;
    return amem;
}

void aligned_free( void *mem )
{
    free( ((void**)mem)[-1] );
}

(thanks Jonathan Leffler)

Edit: Regarding ripping off another memalign implementation, the problem with that is not licensing. Rather, you'd run into the difficulty that any good memalign implementation will be an integral part of the heap-manager codebase, not simply layered on top of malloc/free. So you'd have serious trouble transplanting it to a different heap-manager, especially when you have no access to it's internals.

+2  A: 

Why does the software you are porting need memalign() or posix_memalign()? Does it use it for alignments bigger than the 16-byte alignments referenced by austirg?

I see Mike F posted some code - it looks relatively neat, though I think the while loop may be sub-optimal (if the alignment required is 1KB, it could iterate quite a few times).

Doesn't:

amem += align - ((uintptr)amem & (align - 1));

get there in one operation?

Jonathan Leffler
+1  A: 

Yes Mac OS X does have 16 Byte memory alignment in the ABI. You should not need to use memalign(). If you memory requirements are a factor of 16 then I would not implement it and maybe just add an assert.

witkamp
Why is everyone assuming the guy only needs 16-byte alignment?
A: 

If you need an arbitrarily aligned malloc, check out x264's malloc (common/common.c in the git repository), which has a custom memalign for systems without malloc.h. Its extremely trivial code, to the point where I would not even consider it copyrightable, but you should easily be able to implement your own after seeing it.

Of course, if you only need 16-byte alignment, as stated above, its in the OS X ABI.

Dark Shikari
+1  A: 

From the macosx man pages:

The malloc(), calloc(), valloc(), realloc(), and reallocf() functions allocate memory. The allocated memory is aligned such that it can be used for any data type, including AltiVec- and SSE-related types. The free() function frees allocations that were created via the preceding allocation functions.

ceretullis
A: 

statements to the effect that malloc's 16 byte alignment is sufficient for all purposes are not quite to the point.

There are lots of reasons to push for other alignments. For instance, files opened with O_DIRECT in other platforms require buffer alignments that are multiples of the logical block size of the file system. Maybe not important to you, but there are folks who care. Other places where it can matter is in squeezing out the last bit of performance on large matrix operations or in MPI implementations. (I do this all the time.)

As I say, it may not matter to you, but there are folks who get real performance boost from considering effects of cache alignment, collisions, and the like. ANd some of us write code that runs under OS X.

A: 

Might be worthwhile suggesting using Doug Lea's malloc in your code. link text