views:

396

answers:

3

Is there any portable way to determine what the maximum possible alignment for any type is?

For example on x86, SSE instructions require 16-byte alignment, but as far as I'm aware, no instructions require more than that, so any type can be safely stored into a 16-byte aligned buffer.

I need to create a buffer (such as a char array) where I can write objects of arbitrary types, and so I need to be able to rely on the beginning of the buffer to be aligned.

If all else fails, I know that allocating a char array with new is guaranteed to have maximum alignment, but with the TR1/C++0x templates alignment_of and aligned_storage, I am wondering if it would be possible to create the buffer in-place in my buffer class, rather than requiring the extra pointer indirection of a dynamically allocated array.

Ideas?

I realize there are plenty of options for determining the max alignment for a bounded set of types: A union, or just alignment_of from TR1, but my problem is that the set of types is unbounded. I don't know in advance which objects must be stored into the buffer.

+5  A: 

Short of some maximally_aligned_t type that all compilers promised faithfully to support for all architectures everywhere, I don't see how this could be solved at compile time. As you say, the set of potential types is unbounded. Is the extra pointer indirection really that big a deal?

David Seiler
It might not be, but I'm curious if there is a solution. C++0x adds a couple of other alignment-related functions, and the implementation already has to determine the maximum possible alignment in other cases (when dynamically allocating a char array) so I thought there might be some obscure standard library template which exposes this value.
jalf
Yeah. It's an interesting question, and I wish I had a better answer for you, but I don't think there's any standards-conformant way. maximally_aligned_t (or better, maximal_alignment) wouldn't be hard to implement, though; perhaps you should propose it for c++1x :)
David Seiler
Heh, looking through the C++0x draft, there is actually a std::max_align_t type defined. Guess that solves the problem. :)
jalf
+1  A: 

Allocating aligned memory is trickier than it looks - see for exampel http://jongampark.wordpress.com/2008/06/12/implementation-of-aligned-memory-alloc/

Martin Beckett
I know it's tricky. That wasn't my question. ;)But the standard does give some guarantees, and especially when you take C++0x into account, you do have a couple of *standard* tools to help out.
jalf
The trickiness doesn't apply to Jalf because he's not making a general allocator. All he needs is to have extra space in his buffer, and round up the in-buffer pointer to the next desired alignment block.
Potatoswatter
+2  A: 

Unfortunately ensuring max alignment is a lot tougher than it should be and there are no guaranteed solutions AFAIK. From GOTW (Fast Pimpl article):

union max_align {
  short       dummy0;
  long        dummy1;
  double      dummy2;
  long double dummy3;
  void*       dummy4;
  /*...and pointers to functions, pointers to
       member functions, pointers to member data,
       pointers to classes, eye of newt, ...*/
};

union {
  max_align m;
  char x_[sizeofx];
};

This isn't guaranteed to be fully portable, but in practice it's close enough because there are few or no systems on which this won't work as expected.

That's about the closest 'hack' I know for this.

There is another approach that I've used personally for super fast allocation. Note that it is evil but I work in raytracing fields where speed is one of the greatest measures of quality and we profile code on a daily basis. It involves using a heap allocator with pre-allocated memory that works like the local stack (just increments a pointer on allocation and decrements one on deallocation).

I use it for pimpls particularly. However, just having the allocator is not enough; for such an allocator to work, we have to assume that memory for a class, Foo, is allocated in a constructor, the same memory is likewise deallocated only in the destructor, and that Foo itself is created on the stack. To make it safe, I needed a function to see if the 'this' pointer of a class is on the local stack to determine if we can use our super fast heap-based stack allocator. For that we had to research OS-specific solutions: I used TIBs and TEBs for win32/win64, and my co-workers found solutions for linux and OS X.

The result, after a week of researching OS-specific methods to detect stack range, alignment requirements, and doing a lot of testing and profiling, was an allocator that could allocate memory in 4 clock cycles according to our tick counter benchmarks as opposed to about 400 cycles for malloc/operator new (our test involved thread contention so malloc is likely to be a bit faster than this in single-threaded cases, perhaps a couple hundred cycles). We added a per-thread heap stack and detected which thread was being used which increased the time to about 12 cycles, though the client can keep track of the thread allocator to get the 4 cycle allocations. It wiped out memory allocation based hotspots off the map.

While you don't have to go through all that trouble, writing a fast allocator might be easier and more generally applicable (ex: allowing the amount of memory to allocate/deallocate to be determined at runtime) than something like max_align here. max_align is easy enough to use but if you're after speed for memory allocations (and assuming you've already profiled your code and found hotspots in malloc/free/operator new/delete with major contributors being in code you have control over), writing your own allocator can really make the difference.

+1. Wow, 100 times faster allocation. Thanks for sharing this information.
Peter Mortensen