ansaurus

Question

Intrinsics program (SSE) - g++ - help needed

Answer 1

+2 A:

You're using ASM blocks, not intrinsic.

Since those xmmX are registers, you should prefix them with a %:

      "\n\tmovups  (%eax), %xmm0"
      // etc.

And your ASM is has several errors.

you should not modify the ebx register.
$a etc is considered a global symbol in the assembler, which it is not.
addps %xmm0, %xmm1 will store the result into xmm1. Remember in AT&T syntax the destination is on the right.

The corrected ASM block would be like

    asm volatile (  
      "movl %1, %%eax"
      "\n\tmovl %2, %%ecx"  
      "\n\tmovups  (%%eax), %%xmm0"
      "\n\tmovups (%%ecx), %%xmm1"  
      "\n\taddps %%xmm0, %%xmm1"
      "\n\tmovups %%xmm0, %0"
      : "=m"(result)
      : "r"(&a), "r"(&b)
      : "eax", "ecx");

Basically, %0 will be replaced by the address of result, %1 and %2 will be replaced by &a and &b. See http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html for a detailed explanation. The "eax", "ecx" prevents these 2 registers from being used as a replacement of those %n.

But the first 2 movl's are unnecessary...

    asm volatile(  
      "\n\tmovups (%1), %%xmm0"
      "\n\tmovups (%2), %%xmm1"  
      "\n\taddps %%xmm1, %%xmm0"
      "\n\tmovups %%xmm0, %0"
      : "=m"(result)
      : "r"(&a), "r"(&b));

Since you mentioned intrinsic, why not use __builtin_ia32_addps?

KennyTM 2010-05-26 09:16:34

Sriram 2010-05-26 09:25:42

Wow! that did it. I had been going through the documentation you referred me to, but I suppose I was not thorough enough. I have a question: Is this not the same as intrinsics? Whenever I read a document on the net regarding the same, it says that intrinsics are a way of using inline assembly code in C/C++ programs using the instructions I have used above. Am I right in saying that?

Sriram 2010-05-26 10:12:35

@Siriam: See http://en.wikipedia.org/wiki/Intrinsic_function. ASM block is not a function.

KennyTM 2010-05-26 10:24:06

The wikipedia entry for SSE http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions mentions an example for addition of 2 vectors. They use the same instructions I have used above (in pseudocode). What do I need to do to "convert" this code into intrinsic functions?

Sriram 2010-05-26 10:41:44

ansaurus

tags:

views:

answers:

Intrinsics program (SSE) - g++ - help needed

related questions