views:

41

answers:

2

does struct member alignment in VC bring performance benefit? if it is what is the best performance implication by using this and which size is best for current cpu architecture (x86_64, SSE2+, ..)

A: 

The default alignment used by the compiler should be appropriate for the target platform (32- or 64-bit Intel/AMD) for general data. To take advantage of SIMD, you might have to use a more restrictive alignment on those arrays, but that's usually done with a #pragma or special data type that applies just to the data you'll be using in the SIMD instructions.

Adrian McCarthy
which size of alignment is the best fit to use SIMD instruction?
uray
16. __declspec( align( 16 ) ), or directly use __m128i.
Christopher
+1  A: 

Perf takes a nose-dive on x86 and x64 cores when a member straddles a cache line boundary. The common compiler default is 8 byte packing which ensures you're okay on long long, double and 64-bit pointer members.

SSE2 instructions require an alignment of 16, the code will bomb if it is off. You cannot get that out of a packing pragma, the heap allocator for example will only provide an 8-byte alignment guarantee. Find out what your compiler and CRT support. Something like __declspec(align(16)) and a custom allocator like _aligned_malloc(). Or over-allocate the memory and tweak the pointer yourself.

Hans Passant
It's not just about crossing cache-line boundaries. If your 32-bit int isn't aligned to a 4-byte boundary, there will still be a perf hit even if it's within the cache line. The compiler generally pads elements to start on a multiple of their size (up to 8 bytes). That's not to say it'll pad everything to 8-byte boundaries. With the compiler's default packing, an built-in type will always be appropriately aligned. If you have an array of structs, some of the array elements may cross a cache-line boundary, but none of the individual members will.
Adrian McCarthy