views:

397

answers:

5

Hi there,

I am new to optimizing code with SSE/SSE2 instructions and until now I have not gotten very far. To my knowledge a common SSE-optimized function would look like this:

void sse_func(const float* const ptr, int len){
    if( ptr is aligned )
    {
        for( ... ){
            // unroll loop by 4 or 2 elements
        }
        for( ....){
            // handle the rest
            // (non-optimized code)
        }
    } else {
        for( ....){
            // regular C code to handle non-aligned memory
        }
    }
}

However, how do I correctly determine if the memory ptr points to is aligned by e.g. 16 Bytes? I think I have to include the regular C code path for non-aligned memory as I cannot make sure that every memory passed to this function will be aligned. And using the intrinsics to load data from unaligned memory into the SSE registers seems to be horrible slow (Even slower than regular C code).

Thank you in advance...

+5  A: 

EDIT: casting to long is a cheap way to protect oneself against the most likely possibility of int and pointers being different sizes nowadays.

As pointed out in the comments below, there are better solutions if you are willing to include a header...

A pointer p is aligned on a 16-byte boundary iff ((unsigned long)p) & 15 == 0.

Pascal Cuoq
I think casting a pointer to int is a bad idea? My code will be compiled on both x86 and x64 systems.I hoped there would be some secret system macro `is_aligned_mem()` or so.
You could instead use `uintptr_t` - it is guaranteed the correct size to hold a pointer. Provided that your compiler defines it, of course.
Anon.
No, a pointer *is* an int. It just isn't used as a numeric generally.
Paul Nathan
It doesn't really matter if the pointer and integer sizes don't match. You only care about the bottom few bits.
Richard Pennington
Well if there was a secret system macro you can be sure that it will work by casting the pointer to int. There is nothing magic going on with this cast, you are just asking the compiler to let you look at how the pointer is represented in bits. If you don't do that, how can you ever know if it is aligned ?
Bill Forster
I would usually use `p % 16 == 0`, as compilers usually know the powers of 2 just as well as I do, and I find this more readable
Hasturkun
@random-name, Anon.: I edited the answer.
Pascal Cuoq
`int` traditionally was the size of the system word, aka a pointer. Is that changing in the 32-bit to 64-bit transition? (curious)
Paul Nathan
Pascal Cuoq
Thanks for all the answers.@Richard Pennington: That's a good point.@Bill Forster: I know someone has eventually to compare the actual bits but I wanted a safe and cross-platform (x86, x64) way. It scares me a bit that there are so many self-made solutions. And I have not found the recommended one on MSDN or at Intel's website.
@Paus Nathan:It depends if you have a ILP64 or LP64 x64 system. E. g. Windows on x64 architecture is LP64, that means an int is still 32-Bit but long has 64 bits. I am not sure about Linux on x64 though.
Hasturkun
Hasturkun
@Hasturkun I just compiled `int d(int x) { return x / 8; }` with `gcc -S`. It is both beautiful and sad... Mostly sad...
Pascal Cuoq
@Pascal Cuoq: I do agree about that, but it still handles the modulus and compare to 0 correctly (so long as the optimizer is being used, otherwise may emit the modulus (which it doesn't in my case, but does this far less efficiently).
Hasturkun
+1  A: 

Can you just 'and' the ptr with 0x03 (aligned on 4s), 0x07 (aligned on 8s) or 0x0f (aligned on 16s) to see if any of the lowest bits are set?

Paul Tomblin
Steve Jessop
+1  A: 

Try this:

if (((int)ptr & (sizeof(*ptr) - 1)) == 0) {
    // ptr is aligned
}

This assumes that sizeof(*ptr) is a power of two (2n), and checks that the lowest n bits of ptr are all zero.

Greg Hewgill
+5  A: 

Other answers suggest an AND operation with low bits set, and comparing to zero.

But a more straight-forward test would be to do a MOD with the desired alignment value, and compare to zero.

#define ALIGNMENT_VALUE     16u

if (((uintptr_t)ptr % ALIGNMENT_VALUE) == 0)
{
    // ptr is aligned
}
Craig McQueen
I upvoted you, but only because you are using unsigned integers :)
Pascal Cuoq
+3  A: 
#define is_aligned(POINTER, BYTE_COUNT) \
    (((uintptr_t)(void *)&*(POINTER)) % (BYTE_COUNT) == 0)

The cast to void * is necessary because the standard requires only for void * to produce integer values in range of uintptr_t.

The &* is there to produce a compilation error for non-pointer arguments.

edit:

I added &* to add ameasure of type-safety to the code; problem is that it now won't work for void * arguments as these can't be dereferenced.

If you want type safety, you're probably better of with an inline function like this

static inline _Bool is_aligned(void *pointer, size_t byte_count)
{ return (uintptr_t)pointer % byte_count == 0; }

and hope for compiler optimizations if byte_count is a compile-time constant.

Christoph
This macro looks really nasty and sophisticated at once. I will definitely test it.