ansaurus

Question

c++ how to write code the compiler can easily optimize for SIMD?

Answer 1

A:

The compiler is not all mighty, and it has some limitations. If it can (and if right flags are passed to it), it will use SSE instructions. The only way to see what it did is to examine the assembly code generated by the compiler.

Another option is to use C SSE/SSE2 instructions. For windows you can find them here:

http://msdn.microsoft.com/en-us/library/y0dh78ez%28VS.80%29.aspx

VJo 2010-10-26 18:36:39

Answer 2

+4 A:

i'm working in Visual Studio 2008 and in the project settings I see the option for "activate Extended Instruction set" which I can set to None, SSE or SSE2

So the compiler will try to batch instructions together in order to make use of SIMD instructions?

No, the compiler will not use vector instructions on its own. It will use scalar SSE instructions instead of x87 ones.

What you describe is called "automatic vectorization". Microsoft compilers do not do this, Intel compilers do.

On Microsoft compiler you can use intrinsics to perform manual SSE optimizations.

Suma 2010-10-26 18:40:28

so in order to make use of vector instructions it's necessary to write assembly?

Mat 2010-10-26 20:13:28

I think Suma means one of these: http://software.intel.com/en-us/articles/intel-compilers/

Matt Kane 2010-10-26 20:29:25

@Mat - you can use compiler intrinsics to write SIMD code. See http://msdn.microsoft.com/en-us/library/y0dh78ez%28VS.71%29.aspx

celion 2010-10-26 22:43:26

Answer 3

+2 A:

Three observations.

The best speedups are not coming from optimizations but from good algorithms. So make sure you get that part right first. Often this means just using the right libraries for your specific domain.
Once you get your algorithms right it is time to Measure. Often there is an 80/20 rule at work. 20% of your code will take 80% of the execution time. But in order to locate that part you need a good profiler. Intel VTune can give you sampling profile from every function and nice reports that pinpoint the performance killers. Another free alternative is AMD CodeAnalyst if you have an AMD CPU.
The compiler autovectorization capability is not a silver bullet. Although it will try really hard (especially Intel C++) you will often need to help it by rewriting the algorithms in vector form. You can often get much better results by handcrafting small portions of the bottleneck code to use SIMD instructions. You can do that in C code (see VJo's link above) using intrinsics or use inline assembly.

Of course parts 2 and 3 form an iterative process. If you are really serious about this then there are some good books on the subject by Intel folks such as The Software Optimization Cookbook and the processor reference manuals.

renick 2010-10-26 20:59:14

ansaurus

tags:

views:

answers:

c++ how to write code the compiler can easily optimize for SIMD?

related questions