views:

283

answers:

1

hello.

Does gcc have memory alignment pragma, akin #pragma vector aligned in Intel compiler? I would like to tell compiler to optimize particular loop using aligned loads/store instructions. to avoid possible confusion, this is not about struct packing.

e.g:

#if defined (__INTEL_COMPILER)
#pragma vector aligned
#endif
        for (int a = 0; a < int(N); ++a) {
            q10 += Ix(a,0,0)*Iy(a,1,1)*Iz(a,0,0);
            q11 += Ix(a,0,0)*Iy(a,0,1)*Iz(a,1,0);
            q12 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,0,1);
            q13 += Ix(a,1,0)*Iy(a,0,0)*Iz(a,0,1);
            q14 += Ix(a,0,0)*Iy(a,1,0)*Iz(a,0,1);
            q15 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,1,1);
        }

Thanks

+3  A: 

From http://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html

typedef double aligned_double __attribute__((aligned (16)));
// Note: sizeof(aligned_double) is 8, not 16
void some_function(aligned_double *x, aligned_double *y, int n)
{
    for (int i = 0; i < n; ++i) {
        // math!
    }
}

This won't make aligned_double 16 bytes wide. This will just make it aligned to a 16-byte boundary, or rather the first one in an array will be. Looking at the disassembly on my computer, as soon as I use the alignment directive, I start to see a LOT of vector ops. I am using a Power architecture computer at the moment so it's altivec code, but I think this does what you want.

(Note: I wasn't using double when I tested this, because there altivec doesn't support double floats.)

You can see some other examples of autovectorization using the type attributes here: http://gcc.gnu.org/projects/tree-ssa/vectorization.html

Dietrich Epp
neither. I have array that compiler cannot determine alignment.I have to specifically tell to use aligned load and store. it will not be compiler option, it must be pragma, for each individual loop to be vectorized.
aaa
Why can't you use a variable attribute on the array?
Dietrich Epp
array is malloced, plus structure of array is pretty complicated.Specifically, it is a four dimensional tensor
aaa
You can put the alignment on the type then, instead of the variable.
Dietrich Epp
type is double*. If I put alignment on that, all I will get is pointer variable aligned. The array is aligned manually, there is no way around that. Intel pragma specifically tells compiler to use loadpd instructions. I need gcc equivalent
aaa
Yes, put the alignment on the `double` not on the `double*`. Use a typedef to make an aligned_double or equivalent.
Dietrich Epp
that defeats purpose, performance behind aligned instructions is loading two double variables at a time, not loading one double which is 128 bits wide
aaa
http://www.intel.com/software/products/compilers/docs/clin/main_cls/cref_cls/common/cppref_pragma_vector.htm
aaa
It won't make it 16 bytes wide, just aligned to a 16 byte boundary.
Dietrich Epp
if you do that, with alignment on type, you must guarantee that each type element starts at 16 byte boundary. If you create an array of such types, compiler must assume that distance between to consecutive elements is 16 bytes.
aaa
Yes, but that's demonstrably not how the `align` attribute works.
Dietrich Epp
I am not following, can you give me example.I have some raw pointer, which is aligned to 16 bytes, how to inform gcc that it's really 16 bytes aligned
aaa
okay, thanks. I will give try. hopefully works
aaa
thank you. That force compiler together with fast-math to report vectorized loops. Unfortunately performance is below Intel compiler. Probably look to play with parameters more
aaa