views:

124

answers:

1

Hello,

I'm trying to come up with a way to make the computer do some work for me. I'm using SIMD (SSE2 & SSE3) to calculate the cross product, and I was wondering if it could go any faster. Currently I have the following:

const int maskShuffleCross1 = _MM_SHUFFLE(3,0,2,1); // y z x
const int maskShuffleCross2 = _MM_SHUFFLE(3,1,0,2); // z x y

__m128 QuadCrossProduct(__m128* quadA, __m128* quadB)
{
   // (y * other.z) - (z * other.y)
   // (z * other.x) - (x * other.z)
   // (x * other.y) - (y * other.x)

   return
   (
      _mm_sub_ps
      (
         _mm_mul_ps
         (
            _mm_shuffle_ps(*quadA, *quadA, maskShuffleCross1),
            _mm_shuffle_ps(*quadB, *quadB, maskShuffleCross2)
         ),
         _mm_mul_ps
         (
            _mm_shuffle_ps(*quadA, *quadA, maskShuffleCross2),
            _mm_shuffle_ps(*quadB, *quadB, maskShuffleCross1)
         )
      )
   );
}

As you can see, there are four _mm_shuffle_ps's in there, and I wondered if I could replace them with a combination of _mm_unpackhi_ps and _mm_unpacklo_ps which return a2 a3 b2 b3 and a0 a1 b0 b1 respectively and are slightly faster.

I couldn't figure it out on paper, but I thought of a solution. What if let the computer bruteforce the steps required? Just recursively step through the different options and see what gives the correct answer.

I got it work with multiply, it returns this when I want it to return (3, 12, 27, 0):

startA = _mm_set_ps(1.00, 2.00, 3.00, 0.00);
startB = _mm_set_ps(3.00, 3.00, 3.00, 0.00);
result0 = _mm_mul_ps(startA, startB);
// (3.00, 6.00, 9.00, 0.00)
result1 = _mm_mul_ps(startA, result0);
// (3.00, 12.00, 27.00, 0.00)

Very nice, if I say so myself.

However, when I wanted to implement divide I stumbled on a problem. Multiply doesn't just have to call multiply, it also has to call divide. Okay, so we put divide above multiply. But divide doesn't just have to call divide, it also has to call multiply, which is lower in the script, so it doesn't exist yet.

I started with an empty console application in Visual C++ and put everything in QuadTests.cpp.

How do I make sure these two functions can call each other?

Thanks in advance.

+1  A: 

Just to confirm, your problem is that functions arranged like this don't work, because doStuff isn't declared by the time you call it from getFoo:

int getFoo(int bar) {
    doStuff(bar + 1);
}

int doStuff(bar) {
    if (bar == 2) {
        return getFoo(bar);
    }

    return bar * 8;
}

To fix this, you need to make a forward declaration of int doStuff(int). Often, this is done with a header file -- either way, you just need to add something like this:

// #includes, etc. go here

int doStuff(int);
int getFoo(int);

// methods follow
ojrac
Of course! :DIt just slipped my mind, but I knew the human search engine that is Stack Overflow could help me out. ;)
knight666