I am using SSE extensions available in Core2Duo processor (compiler gcc 4.4.1). I see that there are 16 registers available each of which is 128 bit long. Now, I can accommodate 4 integer values into a single register, and 4 in another register and using intrinsics I can add them in one instruction. The obvious advantage is this way I require only 1 instruction instead of 4.
My question is "is that all for SIMD?". Let I have a1, a2, a3, a4, a5, a6, a7, a8 and b1, b2, b3, b4, b5, b6, b7, b8. Let A1, A2 are vector registers. Now, A1 <<< (a1, a2, a3, a4) and B1 <<< (b1, b2, b3, b4), and add (A1, B1) will perform the vector addition.
Let A2 <<< (a5, a6, a7, a8), B2 <<< (b5, b6, b7, b8). Is there an add instruction which can do add(A1, B1) and add(A2, B2) simultaneously.
How many vector functional units are available in core2duo and where can I get these informations?
Any other source of informations related to these is highly appreciated.