views:

157

answers:

3

I am building a class hierarchy that uses SSE intrinsics functions and thus, some of the members of the class need to be 16-byte aligned. For stack instances I can use __declspec(align(#)), like so:

typedef __declspec(align(16)) float Vector[4];
class MyClass{
...
private:
Vector v;
};

Now, since __declspec(align(#)) is a compilation directive, the following code may result in an unaligned instance of Vector on the heap:

MyClass *myclass = new MyClass;

This too, I know I can easily solve by overloading the new and delete operators to use _aligned_malloc and _aligned free accordingly. Like so:

//inside MyClass:
public:
void* operator new (size_t size)throw (std::bad_alloc){
    void * p = _aligned_malloc(size, 16);
    if (p == 0)  throw std::bad_alloc()
    return p; 
}

void operator delete (void *p){
    MyClass* pc = static_cast<MyClass*>(p); 
    _aligned_free(p);
}
...

So far so good.. but here is my problem. Consider th following code:

class NotMyClass{ //Not my code, which I have little or no influence over
...
MyClass myclass;
...
};
int main(){
    ...
    NotMyClass *nmc = new NotMyClass;
    ...
}

Since the myclass instance of MyClass is created statically on a dynamic instance of NotMyClass, myclass WILL be 16-byte aligned relatively to the beginning of nmc because of Vector's __declspec(align(16)) directive. But this is worthless, since nmc is dynamically allocated on the heap with NotMyClass's new operator, which doesn't nesessarily ensure (and definitely probably NOT) 16-byte alignment.

So far, I can only think of 2 approaches on how to deal with this problem: 1. Preventing MyClass users from being able to compile the following code:

MyClass myclass;

meaning,instances of MyClass can only be created dynamically, using the new operator, thus ensuring that all instances of MyClass are truly dynamically allocatted with MyClass's overloaded new. I have consulted on another thread on how to accomplish this and got a few great answers:http://stackoverflow.com/questions/3092198/c-preventing-class-instance-from-being-created-on-the-stack-during-compiltaio

  1. Revert from having Vector members in my Class and only have pointers to Vector as members, which I will allocate and deallocate using _aligned_malloc and _aligned_free in the ctor and dtor respectively. This methos seems crude and prone to error, since I am not the only programmer writing these Classes (MyClass derives from a Base class and many of these classes use SSE).

However, since both solutions have been frowned upon in my team, I come to you for suggestions of a different solution.

+3  A: 

If you're set against heap allocation, another idea is to over allocate on the stack and manually align (manual alignment is discussed in this SO post). The idea is to allocate byte data (unsigned char) with a size guaranteed to contain an aligned region of the necessary size (+15), then find the aligned position by rounding down from the most-shifted region (x+15 - (x+15) % 16, or x+15 & ~0x0F). I posted a working example of this approach with vector operations on codepad (for g++ -O2 -msse2). Here are the important bits:

class MyClass{
   ...
   unsigned char dPtr[sizeof(float)*4+15]; //over-allocated data
   float* vPtr;                            //float ptr to be aligned

   public:
      MyClass(void) : 
         vPtr( reinterpret_cast<float*>( 
            (reinterpret_cast<uintptr_t>(dPtr)+15) & ~ 0x0F
         ) ) 
      {}
   ...
};
...

The constructor ensures that vPtr is aligned (note the order of members in the class declaration is important).

This approach works (heap/stack allocation of containing classes is irrelevant to alignment), is portabl-ish (I think most compilers provide a pointer sized uint uintptr_t), and will not leak memory. But its not particularly safe (being sure to keep the aligned pointer valid under copy, etc), wastes (nearly) as much memory as it uses, and some may find the reinterpret_casts distasteful.

The risks of aligned operation/unaligned data problems could be mostly eliminated by encapsulating this logic in a Vector object, thereby controlling access to the aligned pointer and ensuring that it gets aligned at construction and stays valid.

academicRobot
+1. Definitely a solution worth exploring. I will definitely HAVE to encapsulate this behavior in a class to make it usable. Still, it's memory consumption overhead will probably disqualify it. I am now leaning towards encapsulating a Vector class that can only be allocated on the heap and be done with it. That way I don't have to force MyClass to any aligned behaviour, and I encapsulate the solution where the problem is: in Vector.
eladidan
This is what I used before I encountered `aligned_storage`. I just could not adapt it to arbitrary alignment requirement because of the bit trickery :(.
Matthieu M.
A: 

You can use "placement new."

void* operator new(size_t, void* p) { return p; }

int main() {
    void* p = aligned_alloc(sizeof(NotMyClass));
    NotMyClass* nmc = new (p) NotMyClass;
    // ...

    nmc->~NotMyClass();
    aligned_free(p);
}

Of course you need to take care when destroying the object, by calling the destructor and then releasing the space. You can't just call delete. You could use shared_ptr<> with a different function to deal with that automatically; it depends if the overhead of dealing with a shared_ptr (or other wrapper of the pointer) is a problem to you.

janm
this will not work for me, since NotMyClass instances are created outside the scope of my code and it's difficult to enforce programmers to use placement new, just like I don't want to enforce NotMyClass to overload it's new operator
eladidan
In that case, I don't understand your question. You seem to be asking about how to influence programmers who are not under your control. If that is the case, the best answer is probably "ask them nicely to Do The Right Thing." It is easy to align memory you allocate, but getting code you don't control to do it is harder.
janm
It is harder, but I (hope) not impossible. NotMyClass doesn't care about alignment nor should it, it's just not the classes problem. A good solution for me would be a way that ensures that no matter what where or how, an instance of my class is always truly 16-byte aligned. I don't need that NotMyClass will also be 16-byte aligned. And forcing it to be so seems distasteful
eladidan
Sorry, I misunderstood your problem; I thought you were trying to align NotMyClass. If your requirement is correctly aligned member variables (which it seems to be), go with academicRobot's solution; allocate enough memory in the class to ensure alignment, and then fix up the pointer.
janm
A: 

The upcoming C++0x standard proposes facilities for dealing with raw memory. They are already incorporated in VC++2010 (within the tr1 namespace).

std::tr1::alignment_of // get the alignment
std::tr1::aligned_storage // get aligned storage of required dimension

Those are types, you can use them like so:

static const floatalign = std::tr1::alignment_of<float>::value; // demo only

typedef std::tr1::aligned_storage<sizeof(float)*4, 16>::type raw_vector;
        // first parameter is size, second is desired alignment

Then you can declare your class:

class MyClass
{
public:

private:
  raw_vector mVector; // alignment guaranteed
};

Finally, you need some cast to manipulate it (it's raw memory until now):

float* MyClass::AccessVector()
{
  return reinterpret_cast<float*>((void*)&mVector));
}
Matthieu M.
We use Intel's C++ Compiler ver. 11.1, which doesn't yet have tr1 support. I tried your suggestion using boost's tr1 extensions, but encountered the same problem. Everything worked fine until I allocated NotMyClass on the heap using the new operator. I admit I am unfarmiliar with c++0x in general, and with aligned_storage specifically (never knew it until reading your answer). Could you expand on how aligned_storage allocates data? I am buffled as to how it sets to accomplish alignment both on the stack and on the heap...
eladidan
I don't know much on the internals, I have only used it unfortunately, never really dug into the code.
Matthieu M.
@eladidan - At least in gcc 4.5, `aligned_storage` is implemented with `__attribute__((__aligned__((...))))` in tr1::type_traits, which is the equivalent of your `__declspec`.
academicRobot
if this is the case then it only guarantees alignment on the stack, which is canon with the results I got using boost's tr1 extensions. Which of course doesn't solve the problem
eladidan