views:

121

answers:

3

I am trying to get SSE functionality in my vector class (I've rewritten it three times so far. :\) and I'm doing the following:

#ifndef _POINT_FINAL_H_
#define _POINT_FINAL_H_

#include "math.h"

namespace Vector3D
{

#define SSE_VERSION 3

#if SSE_VERSION >= 2

    #include <emmintrin.h>  // SSE2

    #if SSE_VERSION >= 3

        #include <pmmintrin.h>  // SSE3

    #endif

#else

#include <stdlib.h>

#endif

#if SSE_VERSION >= 2

    typedef union { __m128 vector; float numbers[4]; } VectorData;
    //typedef union { __m128 vector; struct { float x, y, z, w; }; } VectorData;

#else

    typedef struct { float x, y, z, w; } VectorData;

#endif

class Point3D
{

public:

    Point3D();
    Point3D(float a_X, float a_Y, float a_Z);
    Point3D(VectorData* a_Data);
    ~Point3D();

    // a lot of not-so-interesting functions

private:

    VectorData* _NewData();

}; // class Point3D

}; // namespace Vector3D

#endif

It works! Hurray! But it's slower than my previous attempt. Boo.

I've determined that my bottle neck is the malloc I'm using to get a pointer to a struct.

VectorData* Point3D::_NewData() 
{ 

#if SSE_VERSION >= 2

    return ((VectorData*) _aligned_malloc(sizeof(VectorData), 16)); 

#else

    return ((VectorData*) malloc(sizeof(VectorData))); 

#endif

}

One of the main problems with using SSE in a class is that it has to be aligned in memory for it to work, which means overloading the new and delete operators, resulting in code like this:

 BadVector* test1 = new BadVector(1, 2, 3);
 BadVector* test2 = new BadVector(4, 5, 6);
 *test1 *= test2;

You can no longer use the default constructor and you have to avoid new like the plague.

My new approach is basically to have the data external from the class so the class doesn't have to be aligned.

My question is: is there a better way to get a pointer to an (aligned on memory) instance of a struct or is my approach really dumb and there's a much cleaner way?

+2  A: 

How about:

__declspec( align( 16 ) ) VectorData vd;

?

You can also create your own version of operator new as follows

void* operator new( size_t size, size_t alignment )
{
     return __aligned_malloc( size, alignment );
}

which can then make allocationas follows

AlignedData* pData = new( 16 ) AlignedData;

to align at a 16 byte boundary.

If thats no help then i may be misunderstanding what you are asking for ...

Goz
You mean _declspec, I suppose ?
cedrou
LOL really didn't notice that typo!!
Goz
+1  A: 

You should probably not expect to get improved performance for single-use vectors. Parallel processing shines brightest when you can combine the parallel processing with some volume, i.e. when processing many vectors in sequence.

unwind
A: 

I fixed it. :O

It was really rather easy. All I had to do was turn

VectorData* m_Point;

into

VectorData m_Point;

and my problems are gone, with no need for malloc or aligning.

But I appreciate everyone's help! :D

knight666
Sorry, but I doubt that. Yes, MS compiler on x86-64 aligns on 16 byte boundary (not for 32 bit platforms). I doubt that ICC will _always_ align 16 byte on stack if not explicitly said so, too, exactly _because_ it tries to generate really speedy code. A declspec will be necessary, resp. the corresponding gcc option.
gimpf
And, yes, malloc() was a bad idea in the first place...
gimpf