Hello, I want to learn more about using the SSE.
What ways are there to learn, besides the obvious reading the Intel® 64 and IA-32 Architectures Software Developer's Manuals ?
Mainly I'm interested to work with the GCC X86 Built-in Functions.
Hello, I want to learn more about using the SSE.
What ways are there to learn, besides the obvious reading the Intel® 64 and IA-32 Architectures Software Developer's Manuals ?
Mainly I'm interested to work with the GCC X86 Built-in Functions.
First, I don't recommend on using the built-in functions - they are not portable (across compilers of the same arch).
Use intrinsics, GCC does a wonderful job optimizing SSE intrinsics into even more optimized code. You can always have a peek at the assembly and see how to use SSE to it's full potential.
Intrinsics are easy - just like normal function calls:
#include <xmmintrin.h>
__m128 vector1 = _mm_set1_ps(4, 3, 2, 1); // Little endian, stored in 'reverse'
__m128 vector2 = _mm_set1_ps(7, 8, 9, 0);
// Addition
__m128 result = _mm_add_ps(vector1, vector2); // result = vector1 + vector 2
// A more advanced function, called shuffle
vector1 = _mm_shuf_ps(vector1, vector1, _MM_SHUFFLE(0,1,2,3));
// vector1 is now (1, 2, 3, 4) (above shuffle reversed it)
Of course there are way more options, SSE is really powerful and in my opinion relatively easy to learn.
Since you asked for resources:
A practical guide to using SSE with C++: Good conceptual overview on how to use SSE effectively, with examples.
ompf.org: This is a raytracing forum, but has quite a lot of threads on SIMD, giving useful snippets, discussing pitfalls, etc.
MSDN Listing of Compiler Intrinsics: Comprehensive reference for all your intrinsic needs. It's MSDN, but pretty much all the intrinsics listed here are supported by GCC and ICC as well.
Christopher Wright's SSE Page: Quick reference on the meanings of the SSE opcodes. I guess the Intel Manuals can serve the same function, but this is faster.
It's probably best to write most of your code in intrinsics, but do check the objdump of your compiler's output to make sure that it's producing efficient code. SIMD code generation is still a fairly new technology and it's very possible that the compiler might get it wrong in some cases.