views:

582

answers:

4

Hello all, currently I'm dealing with a video processing software in which the picture data (8bit signed and unsigned) is stored in arrays of 16-aligned integers allocated as

__declspec(align(16)) int *pData = (__declspec(align(16)) int *)_mm_malloc(width*height*sizeof(int),16);

Generally, wouldn't it enable faster reading and writing if one used signed/unsigned char arrays like this?:

__declspec(align(16)) int *pData = (__declspec(align(16)) unsigned char *)_mm_malloc(width*height*sizeof(unsigned char),16);

I know little about cache line size and data transfer optimization, but at least I know that it is an issue. Beyond that, SSE will be used in future, and in that case char-arrays - unlike int arrays - are already in a packed format. So which version would be faster?

A: 

on the contrary, packing and unpacking is CPU commands expensive.

if you want to make a lot of a random pixel operations - it is faster to make it an array of int so that each pixel has its own address.

but if you iterate through your image sequencly you want to make a chars array so that it is small in size and reduces the chances to have a page fault (Especially for large images)

kiwi
+2  A: 

If you're planning to use SSE, storing the data in its native size (8-bit) is almost certainly a better choice, since loads of operations can be done without unpacking, and even if you need to unpack for pmaddwd or other similar instructions, its still faster because you have to load less data.

Even in scalar code, loading 8-bit or 16-bit values is no slower than loading 32-bit, since movzx/movsx is no different in speed from mov. So you just save memory, which surely can't hurt.

Dark Shikari
A: 

It really depends on your target CPU -- you should read up on its specs and run some benchmarks as everyone has already suggested. Many factors could influence performance. The first obvious one that comes to my mind is that your array of ints is 2 to 4 times larger than an array of chars and, hence, if the array is big enough, you'll get fewer data cache hits, which will definitely slow down the performance.

Alexander
A: 

Char arrays can be slower in some cases. As a very general rule of thumb, the native word size is the best to go for, which will more than likely be 4-byte (32-bit) or 8-byte (64-bit). Even better is to have everything aligned to 16-bytes as you have already done... this will enable faster copies if you use SSE instructions (MOVNTA). If you are only concerned with moving items around this will have a much greater impact than the type used by the array...

jheriko