ansaurus

Question

Determining maximum possible alignment in C++

Answer 1

+5 A:

Short of some maximally_aligned_t type that all compilers promised faithfully to support for all architectures everywhere, I don't see how this could be solved at compile time. As you say, the set of potential types is unbounded. Is the extra pointer indirection really that big a deal?

David Seiler 2009-10-06 19:59:06

It might not be, but I'm curious if there is a solution. C++0x adds a couple of other alignment-related functions, and the implementation already has to determine the maximum possible alignment in other cases (when dynamically allocating a char array) so I thought there might be some obscure standard library template which exposes this value.

jalf 2009-10-06 20:25:10

Yeah. It's an interesting question, and I wish I had a better answer for you, but I don't think there's any standards-conformant way. maximally_aligned_t (or better, maximal_alignment) wouldn't be hard to implement, though; perhaps you should propose it for c++1x :)

David Seiler 2009-10-06 20:48:29

Heh, looking through the C++0x draft, there is actually a std::max_align_t type defined. Guess that solves the problem. :)

jalf 2010-01-11 22:17:56

Answer 2

+1 A:

Allocating aligned memory is trickier than it looks - see for exampel http://jongampark.wordpress.com/2008/06/12/implementation-of-aligned-memory-alloc/

Martin Beckett 2009-10-06 20:00:03

I know it's tricky. That wasn't my question. ;)But the standard does give some guarantees, and especially when you take C++0x into account, you do have a couple of *standard* tools to help out.

jalf 2009-10-06 20:26:25

The trickiness doesn't apply to Jalf because he's not making a general allocator. All he needs is to have extra space in his buffer, and round up the in-buffer pointer to the next desired alignment block.

Potatoswatter 2009-12-28 00:52:05

Answer 3

+2 A:

Unfortunately ensuring max alignment is a lot tougher than it should be and there are no guaranteed solutions AFAIK. From GOTW (Fast Pimpl article):

union max_align {
  short       dummy0;
  long        dummy1;
  double      dummy2;
  long double dummy3;
  void*       dummy4;
  /*...and pointers to functions, pointers to
       member functions, pointers to member data,
       pointers to classes, eye of newt, ...*/
};

union {
  max_align m;
  char x_[sizeofx];
};

This isn't guaranteed to be fully portable, but in practice it's close enough because there are few or no systems on which this won't work as expected.

That's about the closest 'hack' I know for this.

There is another approach that I've used personally for super fast allocation. Note that it is evil but I work in raytracing fields where speed is one of the greatest measures of quality and we profile code on a daily basis. It involves using a heap allocator with pre-allocated memory that works like the local stack (just increments a pointer on allocation and decrements one on deallocation).

I use it for pimpls particularly. However, just having the allocator is not enough; for such an allocator to work, we have to assume that memory for a class, Foo, is allocated in a constructor, the same memory is likewise deallocated only in the destructor, and that Foo itself is created on the stack. To make it safe, I needed a function to see if the 'this' pointer of a class is on the local stack to determine if we can use our super fast heap-based stack allocator. For that we had to research OS-specific solutions: I used TIBs and TEBs for win32/win64, and my co-workers found solutions for linux and OS X.

The result, after a week of researching OS-specific methods to detect stack range, alignment requirements, and doing a lot of testing and profiling, was an allocator that could allocate memory in 4 clock cycles according to our tick counter benchmarks as opposed to about 400 cycles for malloc/operator new (our test involved thread contention so malloc is likely to be a bit faster than this in single-threaded cases, perhaps a couple hundred cycles). We added a per-thread heap stack and detected which thread was being used which increased the time to about 12 cycles, though the client can keep track of the thread allocator to get the 4 cycle allocations. It wiped out memory allocation based hotspots off the map.

While you don't have to go through all that trouble, writing a fast allocator might be easier and more generally applicable (ex: allowing the amount of memory to allocate/deallocate to be determined at runtime) than something like max_align here. max_align is easy enough to use but if you're after speed for memory allocations (and assuming you've already profiled your code and found hotspots in malloc/free/operator new/delete with major contributors being in code you have control over), writing your own allocator can really make the difference.

2010-06-27 09:43:52

+1. Wow, 100 times faster allocation. Thanks for sharing this information.

Peter Mortensen 2010-06-27 11:57:08

ansaurus

tags:

views:

answers:

Determining maximum possible alignment in C++

related questions