tags:

views:

107

answers:

3

hello there!

i'm working in Visual Studio 2008 and in the project settings I see the option for "activate Extended Instruction set" which I can set to None, SSE or SSE2

So the compiler will try to batch instructions together in order to make use of SIMD instructions?

Are there any rules one can follow in how to optimize code such that the compiler can make effiecient assembler using these extensions?

For example currently i'm working on a raytracer. A shader takes some input and calculates from the input an output color, like this:

PixelData data = RayTracer::gatherPixelData(pixel.x, pixel.y);
Color col = shadePixel(data);

would it for example be beneficial to write the shadercode such that it would shade 4 different pixels within one instruction call? something like this:

PixelData data1 = RayTracer::gatherPixelData(pixel1.x, pixel1.y);
...
shadePixels(data1, data2, data3, data4, &col1out, &col2out, &col3out, &col4out);

to process multiple dataunits at once. would This be beneficial for making the compiler use SSE instructions?

thanks!

A: 

The compiler is not all mighty, and it has some limitations. If it can (and if right flags are passed to it), it will use SSE instructions. The only way to see what it did is to examine the assembly code generated by the compiler.

Another option is to use C SSE/SSE2 instructions. For windows you can find them here:

http://msdn.microsoft.com/en-us/library/y0dh78ez%28VS.80%29.aspx

VJo
+4  A: 

i'm working in Visual Studio 2008 and in the project settings I see the option for "activate Extended Instruction set" which I can set to None, SSE or SSE2

So the compiler will try to batch instructions together in order to make use of SIMD instructions?

No, the compiler will not use vector instructions on its own. It will use scalar SSE instructions instead of x87 ones.

What you describe is called "automatic vectorization". Microsoft compilers do not do this, Intel compilers do.

On Microsoft compiler you can use intrinsics to perform manual SSE optimizations.

Suma
so in order to make use of vector instructions it's necessary to write assembly?
Mat
I think Suma means one of these: http://software.intel.com/en-us/articles/intel-compilers/
Matt Kane
@Mat - you can use compiler intrinsics to write SIMD code. See http://msdn.microsoft.com/en-us/library/y0dh78ez%28VS.71%29.aspx
celion
+2  A: 

Three observations.

  1. The best speedups are not coming from optimizations but from good algorithms. So make sure you get that part right first. Often this means just using the right libraries for your specific domain.

  2. Once you get your algorithms right it is time to Measure. Often there is an 80/20 rule at work. 20% of your code will take 80% of the execution time. But in order to locate that part you need a good profiler. Intel VTune can give you sampling profile from every function and nice reports that pinpoint the performance killers. Another free alternative is AMD CodeAnalyst if you have an AMD CPU.

  3. The compiler autovectorization capability is not a silver bullet. Although it will try really hard (especially Intel C++) you will often need to help it by rewriting the algorithms in vector form. You can often get much better results by handcrafting small portions of the bottleneck code to use SIMD instructions. You can do that in C code (see VJo's link above) using intrinsics or use inline assembly.

Of course parts 2 and 3 form an iterative process. If you are really serious about this then there are some good books on the subject by Intel folks such as The Software Optimization Cookbook and the processor reference manuals.

renick