views:

779

answers:

3

I've recently encountered what I think is a false-sharing problem in my application, and I've looked up Sutter's article on how to align my data to cache lines. He suggests the following C++ code:

// C++ (using C++0x alignment syntax)
template<typename T>
struct cache_line_storage {
   [[ align(CACHE_LINE_SIZE) ]] T data;
   char pad[ CACHE_LINE_SIZE > sizeof(T)
        ? CACHE_LINE_SIZE - sizeof(T)
        : 1 ];
};

I can see how this would work when CACHE_LINE_SIZE > sizeof(T) is true -- the struct cache_line_storage just ends up taking up one full cache line of memory. However, when the sizeof(T) is larger than a single cache line, I would think that we should pad the data by CACHE_LINE_SIZE - T % CACHE_LINE_SIZE bytes, so that the resulting struct has a size that is an integral multiple of the cache line size. What is wrong with my understanding? Why does padding with 1 byte suffice?

+4  A: 

You can't have arrays of size 0, so 1 is required to make it compile. However, the current draft version of the spec says that such padding is unecessary; the compiler must pad up to the struct's alignment.

Note also that this code is ill-formed if CACHE_LINE_SIZE is smaller than alignof(T). To fix this, you should probably use [[align(CACHE_LINE_SIZE), align(T)]], which will ensure that a smaller alignment is never picked.

coppro
+1, although in practice, I'm not aware of any case where a struct alignment can be higher than the cacheline size. After all type alignment constraints usually come from cacheline constraints.
Bahbar
Is the align applied to the struct or the first element of the struct in your suggestion?
Heath Hunnicutt
But as for the mechanics of false-sharing itself.. to prevent it, shouldn't `T` be aligned to an integral multiple of the cache line size, even if `sizeof(T) > CACHE_LINE_SIZE`? Or is this not necessary?
int3
@splicer: alignements are always power of 2. So if align(T) > CACHE_LINE_SIZE, it _is_ a multiple of CACHE_LINE_SIZE
Bahbar
@splicer, @Bahbar: Specifying two alignments causes the loosest alignment that is satisfied by both to be used (e.g. a [[align(2), align(3)]] would be aligned to 6 bits if you had a platform that could support a non-power-of-2 alignment, which can in theory exist. The standard allows for any alignment conceivable.
coppro
@coppro. Couple things: (1) I think it's bytes, not bits. (2) Not sure about the standard, but http://gcc.gnu.org/gcc-4.4/cxx0x_status.html (and more to the point, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2341.pdf) states p3:[ Footnote: It is intended that every valid alignment value is an integral power of two. – end footnote ]. It does not hurt to be conservative though.
Bahbar
Huh... This had me ticked enough to go verify the spec <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2798.pdf>. In 3.1.5, we have this: "Alignments have an order from weaker to stronger or stricter alignments. Stricter alignments have larger alignment values. An address that satisfies an alignment requirement also satisfies any weaker valid alignmentrequirement." That last bit implies alignments are multiple of each other. you still can imagine non-power of 2 alignments, but the multiple-ness still stands. 2 and 3 cannot both be valid alignments on the same implementation.
Bahbar
@Bahbar: yes, I did mean bytes. I looked it up and you're right about that - there was some discussion of allowing alignment requirements where each alignment wasn't necessarily a multiple of the previous one. I guess they decided to fix it by adding that paragraph, rather than to adjust the definition to allow odd alignments like 3 and 6 (which is what I thought they did). Thanks!
coppro
+2  A: 

Imagine

#define CACHE_LINE_SIZE 32
sizeof(T) == 48

Now, consider how [[ align(CACHE_LINE_SIZE) ]], works. eg:

[[ align(32) ]] Foo foo;

This will force sizeof(Foo) == 32n for some n. ie align() will pad for you, if necessary, in order for things like Foo foo[10]; to have each foo[i] aligned as requested.

So, in our case, with sizeof(T) == 48, this means sizeof(cache_line_storage<T>) == 64.

So the alignment gives you the padding you were hoping for.

However, this is one 'error' in the template. Consider this case:

#define CACHE_LINE_SIZE 32
sizeof(T) == 32

Here we end up with char pad[1];. Which means sizeof(cache_line_storage<T>) == 64. Probably not what you want!

I think the template would need to be modified somewhat:

template <typename T, int padding>
struct pad_or_not
{
   T data;
   char pad[padding];
};

// specialize the 0 case
// As it is late, I am SURE I've got the specialization syntax wrong...
template <typename T, int>
struct pad_or_not<0>
{
   T data;
};

template<typename T>
struct cache_line_storage {
   [[ align(CACHE_LINE_SIZE) ]] pad_or_not<T, (sizeof(T) > CACHE_LINE_SIZE ? 0 : CACHE_LINE_SIZE - sizeof(T) ) > data;
};

or something like that.

tony
A: 

"You can't have arrays of size 0, so 1 is required to make it compile" - GNU C does allow arrays dimensioned as zero. See also http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Zero-Length.html

MikeW