views:

1265

answers:

5

I've been reading up on the x86 instruction set extensions, and they only seem useful in some quite specific circumstances (eg HADDPD - (Horizontal-Add-Packed-Double) in SSE3). These require a certain register layout that needs to be either deliberately set up, or occur from the series of instructions before it. How often do general-purpose compilers like gcc actually use these instructions (or a subset thereof), or are they mainly to be used in hand-coded assembler? How does the compiler detect where it is appropriate to use SIMD instructions?

+9  A: 

Generally, few compilers use them. GCC and Visual Studio arn't usually able to use the SIMD instructions. If you enable SSE as a compiler flag, it will use the scalar SSE instructions for regular floating-point operations, but generally, don't expect the vectorized ones to be used automatically. Recent versions of GCC might be able to use them in some cases, but didn't work last I tried. Intel's C++ compiler is the only big compiler I know of that is able to auto-vectorize some loops.

In general though, you'll have to use them yourself. Either in raw assembler, or by using compiler intrinsics. In general, I'd say intrinsics are the better approach, since they better allow the compiler to understand the code, and so schedule and optimize, but in practice, I know MSVC at least doesn't always generate very efficient code from intrinsics, so plain asm may be the best solution there. Experiment, see what works. But don't expect the compiler to use these instructions for you, unless you 1) use the right compiler, and 2) write fairly simple loops that can be trivially vectorized.

jalf
Has intrinsics become (a lot) better the last few years? Last time I checked, both MSVC and ICC had quite lousy register allocation, and even I was easily able to beat the compiler-intrinsic version with hand-coded assembly.
snemarch
I believe recent versions of MSVC have made *some* improvements to intrinsics-generated code. But I don't know how much difference that has made.
jalf
MSVC's output for scalar SSE is still just terrible, especially if you use an intrinsic anywhere.
Crashworks
A: 

I probably wouldn't use them if I could. Beware of Intel/AMD incompatibilities. This may be obsolete advice now, or it may not. I have no way to tell.

EDIT: obsolete, probably by a very long time.

Joshua
That's quite a weak reason to avoid such instructions. They wouldn't exist if they didn't serve a purpose.
TURBOxSPOOL
Got a compiler that does both sets TURBO? I sure don't.
Joshua
This is just incorrect, all modern Intel and AMD processors, and all modern compilers (GCC, VS) support SSE and MMX.
Zifre
I did say it might be obsolete. My last assembly manuals suggested otherwise.
Joshua
Might want to brush up on facts before posting? :) - MMX is universal, and both AMD and Intel share very large subsets of SSE; relatively few instructions are limited to just one of the platforms. 3DNow! was AMD-only and is pretty dead by now.
snemarch
+6  A: 

Mono can use SIMD extensions as long as you use its classes for vectors. You can read about it here: http://tirania.org/blog/archive/2008/Nov-03.html

GCC should do some automatic vectorisation as long as you're using -O3 or a specific flag. They have an info page here: http://gcc.gnu.org/projects/tree-ssa/vectorization.html

viraptor
+2  A: 

The question of how to exploit SSE and other small vector units automatically (without direction from the programmer in the form of special language constructs or specially blessed compiler "intrinsics") has been a topic of compiler research for some time. Most results seem to be specialized to a particular problem domain, such as digital signal processing. I have not kept up with the literature on this topic, but what I have read suggests that exploiting the vector (SSE) unit is still a topic for research, and that one should have low expectations of general-purpose compilers commonly used in the field.

Suggested search term: vectorizing compiler

Norman Ramsey
A: 

If you use the vector pascal compiler you will get efficient SIMD code for types for which SIMD gives an advantage. Basically this is anything of length less than 64 bits. ( for 64 bit reals it is actually slower to do SIMD). Latest versions of the compiler will also automatically parallelise accross cores

Paul Cockshott