Is there a simple tutorial for me to get up to speed in SSE, SSE2 and SSE3 in GNU C++? How can you do code optimization in SSE?
Check out the -mtune and -march options, -msse*, and -mfpmath of course. All of those enable GCC to do SSE-specific optimizations.
Anything beyond that is the realm of Assembler, I am afraid.
A simple tutorial? Not that I know of.
But any information about using MMX or any version of SSE will be useful for learning, whether for GCC or for ICC or VC.
To learn about GCC's vector extensions, type "info gcc" and go to Node: Vector Extensions.
Sorry don't know of a tutorial.
Your best bet (IMHO) is to use SSE via the "intrinsic" functions Intel provides to wrap (generally) single SSE instructions. These are made available via a set of include files named *mmintrin.h e.g xmmintrin.h is the original SSE instruction set.
Begin familiar with the contents of Intel's Optimization Reference Manual is a good idea (see section 4.3.1.2 for an example of intrinsics) and the SIMD sections are essential reading. The instruction set reference manuals are pretty helpful too, in that each instruction's documentation includes the "intrinsic" function it corresponds to.
Do spend some time inspecting the assembler produced by the compiler from intrinsics (you'll learn a lot) and on profiling/performance measurement (you'll avoid wasting time SSE-ing code for little return on the effort).
The most simple optimization to use is to allow gcc to emit SSE code.
Flags: -msse, -msse2, -msse3, -march=, -mfpmath=sse
For a more concise list about 386 options, see http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/i386-and-x86_002d64-Options.html#i386-and-x86_002d64-Options , more exact documentation for your specific compiler version is there: http://gcc.gnu.org/onlinedocs/ .
For optimization, always check out Agner Fog's: http://agner.org/optimize/ . I think he doesn't have SSE tutorials for intrinsics, but he has some really neat std-c++ tricks and also provides lots of information about coding SSE assembly (which can often be transcribed to intrinsics).