The v4 series of the gcc compiler can automatically vectorize loops using the SIMD processor some modern CPUs, such as the AMD Athlon or Intel Pentium/Core chips. How is this done?
This page offers details on getting gcc to automatically vectorize loops, including a few examples:
http://gcc.gnu.org/projects/tree-ssa/vectorization.html
In summary, the following options will work for x86 chips with SSE2, giving a log of loops that have been vectorized:
gcc -O2 -ftree-vectorize -msse2 -ftree-vectorizer-verbose=5
Note that -msse is also a possibility, but it will only vectorize loops using floats, not doubles or ints.
There is a gimple (an Intermediate Representation of GCC) pass "pass_vectorize". This pass will enable auto-vectorization at gimple level.
For enabling autovectorization (GCC V4.4.0), we need to following steps
1) Mention the number of words in a vector as per target architecture
This can be done by defining the macro UNITS_PER_SIMD_WORD.
2) The vector modes that are possible needs to be defined in a separate file usually -modes.def.
This file has to reside in the directory where other files containing the machine descriptions are residing on. (As per the configuration script. If you can change the script you can place the file in whatever directory you want it to be in).
3) The modes that are to be considered for vectorization as per target architecture. Like, 4 words will constitute a vector or eight half words will constitute a vector or two double-words will constitute a vector. The details of this needs to be mentioned in the -modes.def file.
For example,
VECTOR_MODES (INT, 8); /* V8QI V4HI V2SI */
VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI */
VECTOR_MODES (FLOAT, 8); /* V4HF V2SF */
4) Build the port. Vectorization can be enabled using the command line options -O2 -ftree-vectorize.