does struct member alignment in VC bring performance benefit? if it is what is the best performance implication by using this and which size is best for current cpu architecture (x86_64, SSE2+, ..)
The default alignment used by the compiler should be appropriate for the target platform (32- or 64-bit Intel/AMD) for general data. To take advantage of SIMD, you might have to use a more restrictive alignment on those arrays, but that's usually done with a #pragma
or special data type that applies just to the data you'll be using in the SIMD instructions.
Perf takes a nose-dive on x86 and x64 cores when a member straddles a cache line boundary. The common compiler default is 8 byte packing which ensures you're okay on long long, double and 64-bit pointer members.
SSE2 instructions require an alignment of 16, the code will bomb if it is off. You cannot get that out of a packing pragma, the heap allocator for example will only provide an 8-byte alignment guarantee. Find out what your compiler and CRT support. Something like __declspec(align(16)) and a custom allocator like _aligned_malloc(). Or over-allocate the memory and tweak the pointer yourself.