views:

573

answers:

4

Has anyone taken advantage of the automatic vectorization that gcc can do? In the real world (as opposed to example code)? Does it take restructuring of existing code to take advantage? Are there a significant number of cases in any production code that can be vectorized this way?

+1  A: 

It is hard to use in any business logic, but gives speed ups when you are processing volumes of data in the same way.

Good example is sound/video processing where you apply the same operation to every sample/pixel. I have used VisualDSP for this, and you had to check the results after compiling - if it is really used where it should.

rkj
A: 

Vectorization will be primarily useful for numerical programs. Vectorized programs can run faster on vector processors like the STI Cell Processor used in PS3 Gaming consoles. There, the numerical computations used in, for example, rendering the game graphics can be speeded up a lot by vectorization. Such processors are called SIMD (Single Instruction Multiple Data) processors.

On other processors vectorization won't be used. Vectorized programs run on a vectorized instruction set which wont be applicable to a non-SIMD processor.

Intel's Nehalem series of processors (released late 2008) implement SSE 4.2 instructions, which are SIMD instructions. Source: wikipedia.

Amit Kumar
@Amit, many modern CPUs today are equipped with vector processors. These vector processors can be accessed by special instructions. Sometimes GCC can emit these instructions when given appropriate flags and an apparently limited set of vectorizable code.
A: 

Vectorized instructions are not limited to Cell processors - most modern workstations-like CPU have them (PPC, x86 since pentium 3, Sparc, etc...). When used well for floating points operations, it can help quite a lot for very computing intensive tasks (filters, etc...). In my experience, automatic vectorization does not work so well.

David Cournapeau
x86 since Pentium MMX, and AMD's K6-2 with 3DNow. Much earlier than the P3.
Novelocrat
A: 

I have yet to see either GCC or Intel C++ automatically vectorize anything but very simple loops, even when given the code of algorithms that can (and were, after I manually rewrote them using SSE intrinsics) be vectorized.

Part of this is being conservative - especially when faced with possible pointer aliasing, it can be very difficult for a C/C++ compiler to 'prove' to itself that a vectorization would be safe, even if you as the programmer know that it is. Most compilers (sensibly) prefer to not optimize code rather than risking miscompiling it. This is one area where higher level languages have a real advantage over C, at least in theory (I say in theory since I'm not actually aware of any automatically vectorizing ML or Haskell compilers).

Another part of it is simply analytical limitations - most research in vectorization, I understand, is related to optimizing classical numerical problems (fluid dynamics, say) which was the bread and butter of most vector machines before a few years ago (when, between CUDA/OpenCL, Altivec/SSE, and the STI Cell, vector programming in various forms became widely available in commercial systems).

It's fairly unlikely that code written for a scalar processor in mind will be easy for a compiler to vectorize. Happily, many things you can do to make it easier for a compiler to understand how to vectorize it, like loop tiling and partial loop unrolling, also (tend to) help performance on modern processors even if the compiler doesn't figure out how to vectorize it.

Jack Lloyd