views:

478

answers:

5

Is it a good idea to vectorize the code? What are good practices in terms of when to do it? What happens underneath?

+3  A: 

It's SSE code Generation.

You have a loop with float matrix code in it matrix1[i][j] + matrix2[i][j] and the compiler generates SSE code.

toto
SEE is not the only vector instruction set. PPC has Altivec and other architectures have their own vector instructions as well.
Amuck
+17  A: 

Vectorization means that the compiler detects that your independent instructions can be executed as one SIMD instruction. Usual example is that if you do something like

for(i=0; i<N; i++){
  a[i] = a[i] + b[i];
}

It will be vectorized as (using vector notation)

for (i=0; i<(N-N%VF); i+=VF){
  a[i:i+VF] = a[i:i+VF] + b[i:i+VF];
}

Basically the compiler picks one operation that can be done on VF elements of the array at the same time and does this N/VF times instead of doing the single operation N times.

It increases performance, but puts more requirement on the architecture.

Zed
So is there anything a programmer can do to ensure vectorization (apart from turning optimization on)?
Jacob
As far as I know compilers are limited in auto-vectorization, so your best bet is to keep your code as trivial as it can be. Also you can check the assembly code generated to see whether the compiler vectorized or not.
Zed
@jacob: you can't really "ensure" it, you might want to take a look at http://openmp.org/ for method of explicitly telling the compiler to vectorize.
D.Shawley
Vectorization doesn't mean that the compiler does it, just that SIMD instructions are used. When the compiler generates SIMD code it's generally called auto-vectorization.
jalf
A: 

Maybe also have a look at libSIMDx86 (source code).

A nice example well explained is:

Choosing to Avoid Branches: A Small Altivec Example

funco
+3  A: 

As mentioned above, vectorization is used to make use of SIMD instructions, which can perform identical operations of different data packed into large registers.

A generic guideline to enable a compiler to autovectorize a loop is to ensure that there are no flow- and anti-dependencies b/w data elements in different iterations of a loop.

http://en.wikipedia.org/wiki/Data%5Fdependency

Some compilers like the Intel C++/Fortran compilers are capable of autovectorizing code. In case it was not able to vectorize a loop, the Intel compiler is capable of reporting why it could not do that. There reports can be used to modify the code such that it becomes vectorizable (assuming it's possible)

Dependencies are covered in depth in the book 'Optimizing Compilers for Modern Architectures: A Dependence-based Approach'

Gautham Ganapathy
A: 

Vectorization need not be limited to single register which can hold large data. Like using '128' bit register to hold '4 x 32' bit data. It depends on architectural limitations. Some architecture have different execution units which have registers of their own. In that case, a part of the data can be fed to that execution unit and the result can be taken from a register corresponding to that execution unit.

For example, consider the below case.

for(i=0; i < N; i++)
{
a[i] = a[i] + b[i];
}



If I am working on an architecture which has two execution units, then my vector size is defined as two. The loop mentioned above will be reframed as

for(i=0; i<(N/2); i+=2)
{
a[i] = a[i] + b[i] ;


a[i+1] = a[i+1] + b[i+1];
}

NOTE: The 2 inside the for statement is derived from the vector size.

As I am having two execution units the two statements inside the loop will be fed into the two execution units. The sum will be accumulated in the execution units separately. Finally the sum of accumulated values (from two execution units) will be carried out.

The good practices are
1. The constraints like dependency (between different iterations of the loop) needs to be checked before vectorizing the loop.
2. Function calls needs to be prevented.
3. Pointer access can create aliasing and it needs to be prevented.

Ganesh Gopalasubramanian