I am trying to get SSE functionality in my vector class (I've rewritten it three times so far. :\) and I'm doing the following:
#ifndef _POINT_FINAL_H_
#define _POINT_FINAL_H_
#include "math.h"
namespace Vector3D
{
#define SSE_VERSION 3
#if SSE_VERSION >= 2
#include <emmintrin.h> // SSE2
#if SSE_VERSION >= 3
#include <pmmintrin.h> // SSE3
#endif
#else
#include <stdlib.h>
#endif
#if SSE_VERSION >= 2
typedef union { __m128 vector; float numbers[4]; } VectorData;
//typedef union { __m128 vector; struct { float x, y, z, w; }; } VectorData;
#else
typedef struct { float x, y, z, w; } VectorData;
#endif
class Point3D
{
public:
Point3D();
Point3D(float a_X, float a_Y, float a_Z);
Point3D(VectorData* a_Data);
~Point3D();
// a lot of not-so-interesting functions
private:
VectorData* _NewData();
}; // class Point3D
}; // namespace Vector3D
#endif
It works! Hurray! But it's slower than my previous attempt. Boo.
I've determined that my bottle neck is the malloc I'm using to get a pointer to a struct.
VectorData* Point3D::_NewData()
{
#if SSE_VERSION >= 2
return ((VectorData*) _aligned_malloc(sizeof(VectorData), 16));
#else
return ((VectorData*) malloc(sizeof(VectorData)));
#endif
}
One of the main problems with using SSE in a class is that it has to be aligned in memory for it to work, which means overloading the new and delete operators, resulting in code like this:
BadVector* test1 = new BadVector(1, 2, 3);
BadVector* test2 = new BadVector(4, 5, 6);
*test1 *= test2;
You can no longer use the default constructor and you have to avoid new
like the plague.
My new approach is basically to have the data external from the class so the class doesn't have to be aligned.
My question is: is there a better way to get a pointer to an (aligned on memory) instance of a struct or is my approach really dumb and there's a much cleaner way?