views:

667

answers:

3

I'm writing a program in C that needs to do some fast math calculations. I'm using inline SSE assembly instructions to get some SIMD action (using packed double precision floating point numbers). I'm compiling using GCC on Linux.

I'm in a situation where I need to loop over some data, and I use a constant factor in my calculations. I'd like to keep that factor tucked away in a safe register during the loop, so I don't have to re-load it every time.

To clarify with some code:

struct vect2 {
    fltpt x;
    fltpt y;
}__attribute__((aligned(16))); /* Align on 16B boundary for SSE2 instructions */
typedef struct vect2 vect2_t;


void function()
{
    /* get a specific value set up in xmm1, and keep it there for the 
     * rest of the loop. */
    for( int i = 0, i<N; i++ ){
     asm(
      "Some calculations;"
      "on an element of;"
      "a data set.;"
      "The value in xmm1;"
      "is needed;"
     );
    }
}

I've tried doing something with the "register" keyword. But if I'm not mistaken, it looks as though I can only preserve a pointer to that structure (in a general register). This would need to be deferenced every iteration, wasting precious time.

register vect2_t hVect asm("xmm1") = {h, h};
/* Gives error: data type of 'hVect' isn't suitable for a register */

register vect2_t *hVect2 asm("rax");
*hVect2 = (vect2_t){h,h};
/* Seems to work, but not what I'm looking for */

I don't just like to assume that GCC won't change the xmm1 register, it's too much of a "demons flying out of one's nose" kind of thing :-). So I'm hoping there is a proper way to do this.

+5  A: 

I think the solution here is to make gcc aware that your vec2_t type is actually a vector; then you can just calculate the loop-invariant value and treat it as a normal variable (except the compiler knows it is a vector type) :

typedef double vec2_t __attribute__ ((vector_size (16)));

void function()
{
  /* get a specific value set up, e.g. */
  vec2_t invariant;
  asm( "some calculations, soring result in invariant."
       : "=x" (invariant) );

  for( int i = 0; i<N; i++ ){
    asm(
            "Some calculations;"
            "on an element of;"
            "a data set.;"
            "The value in xmm1;"
            "is needed;"
            : "x" (invariant) // and other SSE arguments
       );
   }
}

I just compiled this up with a simple calculation inside the loop, and with at least optimisation level 1 the value of invariant is kept in a XMM register during the loop.

(This all assumes you don't need your loop invariant in an explicit XMM register; and that you can use GCC's normal register allocation).

Dave Rigby
+3  A: 
kmm
+3  A: 

I'm used to work with assembly and C and what I would do here is that I would write the entire function in assembly. If you have a flexible make system, I recommend assembling the ASM function separately and linking it into your application. The only problem with this is that the function cannot be inlined by the compiler.

void function(void); //C

extern "C" function(void); //C++

toto