views:

434

answers:

2

I am working on auto vectorization with GCC. I am not in a position to use intrinsics or attributes due to customer requirement. (I cannot get user input to support vectorization)

If the alignment information of the array that can be vectorized is unknown, GCC invokes a pass for 'loop versioning'. Loop versioning will be performed when loop vectorization is done on trees. When a loop is identified to be vectorizable, and the constraint on data alignment or data dependence is hindering it, (because they cannot be determined at compile time), then two versions of the loop will be generated. These are the vectorized and non-vectorized versions of the loop along with runtime checks for alignment or dependence to control which version is executed.

My question is how we have to enforce the alignment? If I have found a loop that is vectorizable, I should not generate two versions of the loop because of missing alignment information.

For example. Consider the below code

short a[15]; short b[15]; short c[15];
int i;

void foo()
{
    for (i=0; i<15; i++)
    {
      a[i] = b[i] ;
    }
}

Tree dump (options: -fdump-tree-optimized -ftree-vectorize)

<SNIP>
     vector short int * vect_pa.49;
     vector short int * vect_pb.42;
     vector short int * vect_pa.35;
     vector short int * vect_pb.30;

    bb 2>:
     vect_pb.30 = (vector short int *) &b;
     vect_pa.35 = (vector short int *) &a;
     if (((signed char) vect_pa.35 | (signed char) vect_pb.30) & 3 == 0) ;; <== (A)
       goto <bb 3>;
     else
       goto <bb 4>;

    bb 3>:
</SNIP>

At 'bb 3' version of vectorized code is generated. At 'bb 4' code without vectorization is generated. These are done by checking the alignment (statement 'A'). Now without using intrinsics and other attributes, how should I get only the vectorized code (without this runtime alignment check.)

+2  A: 

If the data in question is being allocated statically, then you can use the __align__ attribute that GCC supports to specify that it should be aligned to the necessary boundary. If you are dynamically allocating these arrays, you can over-allocate by the alignment value, and then bump the returned pointer up to the alignment you need.

You can also use the posix_memalign() function if you're on a system that supports it. Finally, note that malloc() will always allocate memory aligned to the size of the largest built-in type, generally 8 bytes for a double. If you don't need better than that, then malloc should suffice.

Edit: If you modify your allocation code to force that check to be true (i.e. overallocate, as suggested above), the compiler should oblige by not conditionalizing the loop code. If you needed alignment to an 8-byte boundary, as it seems, that would be something like a = (a + 7) & ~3;.

Novelocrat
If you use <code>posix_memalign()</code>, may sure you declare the variable with `int * __attribute__ ((aligned (XX))) ptr`
J-16 SDiZ
A: 

I get only one version of the loop, using your exact code with these options: gcc -march=core2 -c -O2 -fdump-tree-optimized -ftree-vectorize vec.c

My version of GCC is gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu8).

GCC is doing something clever here. It forces the arrays a and b to be 16-byte aligned. It doesn't do that to c, presumably because c is never used in a vectorizable loop.

Jason Orendorff