tags:

views:

587

answers:

6

So I'm optimizing some code by unrolling some loops (yes, I know that I should rely on my compiler to do this for me, but I'm not working with my choice of compilers) and I wanted to do so somewhat gracefully so that, in case my data size changes due to some edits in the future, the code will degrade elegantly.

Something like:

typedef struct {
    uint32_t alpha;
    uint32_t two;
    uint32_t iii;
} Entry;

/*...*/

uint8_t * bytes = (uint8_t *) entry;
#define PROCESS_ENTRY(i) bytes[i] ^= 1; /*...etc, etc, */ 
#if (sizeof(Entry) == 12)
    PROCESS_ENTRY( 0);PROCESS_ENTRY( 1);PROCESS_ENTRY( 2);
    PROCESS_ENTRY( 3);PROCESS_ENTRY( 4);PROCESS_ENTRY( 5);
    PROCESS_ENTRY( 6);PROCESS_ENTRY( 7);PROCESS_ENTRY( 8);
    PROCESS_ENTRY( 9);PROCESS_ENTRY(10);PROCESS_ENTRY(11);
#else
#   warning Using non-optimized code
    size_t i;
    for (i = 0; i < sizeof(Entry); i++)
    {
        PROCESS_ENTRY(i);
    }
#endif
#undef PROCESS_ENTRY

This not working, of course, since sizeof isn't available to the pre-processor (at least, that's what this answer seemed to indicate).

Is there an easy workaround I can use to get the sizeof a data structure for use with a C macro, or am I just SOL?

+9  A: 

If you are using autoconf or another build configuration system, you could check the size of the data structures at configuration time and write out headers (like #define SIZEOF_Entry 12). Of course this gets more complicated when cross-compiling and such, but I am assuming your build and target architectures are the same.

Otherwise yes, you are out of luck.

Sean Bright
A: 

If you want the smallest possible size for the struct (or to align it to a 4-byte boundary, or whatever), you can use the packed or aligned attributes.

In Visual C++, you can use #pragma pack, and in GCC you can use __attribute__((packed)) and __attribute__((aligned(num-bytes)).

Matthew Iselin
That won't give the size. Further, while packing will save space, it is likely to cost time, and that's what he's trying to gain.
David Thornley
@David Thornley: It won't give the size, but it will do what he *wants* to do in this code block. By packing the structure he will know for certain that padding is not in the way, so the structure *is* exactly 12 bytes (or the sum of the sizes of each item), and therefore there's *no need* to use a preprocessor macro to find the size of the struct. PROCESS_ENTRY uses byte-level access on an unpadded structure, so packing the structure makes it possible to use this macro without worrying about padding.
Matthew Iselin
I should add that it's generally best not to meddle anyway and just let the optimiser do it's job (especially since it's far better at it than most programmers).
Matthew Iselin
+5  A: 

You're out of luck - the preprocessor doesn't even know what a struct is, let alone any way to work out its size.

In a case like this you could just #define a constant to what you happen to know the size of the struct is, then statically assert that it's actually equal to the size using the negative-sized array trick.

Also you could try just doing if (sizeof(Entry) == 12), and see whether your compiler is capable of evaluating the branch condition at compile time and removing dead code. It's not that big an ask.

Steve Jessop
+1 for the 'see whether the optimizer optimizes' suggestion.
Jonathan Leffler
The idea of using an assert() rather than a #warning is also a good one. Do you really want a graceful degrading, or do you want to clearly be informed about the problem so you can fix it?
Brooks Moses
And there are two reasons it can not be 12. One is that there's some unexpected padding (in which case you probably want an error, so you can use compiler-specific pragmas to pack the structure), and one is that you've added a field (in which case you want to update the #define so that it represents the new size, and then also decide whether to update the unrolled loop to handle the new size). So actually, I'd have the assert at the point of definition of the struct, and *also* the warning at the point of unrolling.
Steve Jessop
But I should add that I never allow warnings to live anyway, at least not in my own code, so to me a warning means "this is an error which, if absolutely necessary, you can temporarily ignore just to get the rest of the code compiling".
Steve Jessop
A: 

This probably won't help, but if you have the ability to do this in C++ you can use a template to cause the compiler to dispatch to the appropriate loop at compile time:

template <std::size_t SizeOfEntry>
void process_entry_loop(...)
{
    // ... the nonoptimized version of the loop
}

template <>
void process_entry_loop<12>(...)
{
    // ... the optimized version of the loop
}

// ...

process_entry_loop<sizeof(Entry)>(...);
fbrereto
Good idea, but it would be rather clearer to make sizeof(Entry) the template parameter, and specialize for 12.
Brooks Moses
+1 for the tip - I like that!
fbrereto
+13  A: 

You cannot do it in preprocessor, but you do not need to. Just generate a plain if in your macro:

#define PROCESS_ENTRY(i) bytes[i] ^= 1; /*...etc, etc, */ 
    if (sizeof(Entry) == 12) {
    PROCESS_ENTRY( 0);PROCESS_ENTRY( 1);PROCESS_ENTRY( 2);
    PROCESS_ENTRY( 3);PROCESS_ENTRY( 4);PROCESS_ENTRY( 5);
    PROCESS_ENTRY( 6);PROCESS_ENTRY( 7);PROCESS_ENTRY( 8);
    PROCESS_ENTRY( 9);PROCESS_ENTRY(10);PROCESS_ENTRY(11);
    } else {
    size_t i;
      for (i = 0; i < sizeof(Entry); i++) {
        PROCESS_ENTRY(i);
      }
    }
#endif

sizeof is a constant expression, and comparing a constant against constant is also constant. Any sane C compiler will optimize away the branch that is always false at compile-time - constant folding is one of the most basic optimizations. You lose the #warning, though.

Pavel Minaev
It is always good to look at the problem from slightly different angle. +1 and kudos!
qrdl
+1  A: 

Two other approaches spring to mind - either write a small app to write the unrolled loop, or use a variation on Duff's device with the expected size of the struct.

Pete Kirkham
Any modern compiler that generates code that is slower than Duff's device isn't a very good compiler.
Carson Myers
Doesn't that depend how paranoid it's being about code size? Compilers don't necessarily multiply the size of your code by 12 just for fun. Without profiler input, the optimizer has no way of knowing which loops are worth paying code size to gain speed. Unrolling everything results in large code, lots of icache misses, and slowdowns. You, on the other hand, can intelligently select which loops to unroll (or, even more intelligently, use a compiler that can optimise using profiler data). Unless you mean that Duff's Device is now obsolete due to newer, better tricks that I don't know about.
Steve Jessop