views:

589

answers:

3

I have highly optimized piece of C++ and making even small changes in places far from hot spots can hit performance as much as 20%. After deeper investigation it turned out to be (probably) slightly different registers used in hot spots. I can control inlineing with always_inline attribute, but can I control register allocation?

A: 

It depends on the processor you are using. Or should I say, yes you can with the register keyword, but this is frowned upon unless you are using a simple processor with no pipe-lining and a single core. These days GCC can do a way better job than you can with register allocation. Trust it.

gbrandt
His results show that it is not trustworthy in this case. He's getting 20% performance swings in different builds!
Novelocrat
+2  A: 

In general the register keyword is simply ignored by all modern compilers. The only exception is the (relatively) recent addition of an error if you attempt to take the address of a variable you've marked with the register keyword.

I've experienced this sort of pain as well, and eventually found the only real way around it was to look at output assembly to try and determine what is causing gcc to go off the deepend. There are other things you can do but it depends on exactly what your code is trying to do. I was working in a very very large function with a large amount of computed goto mayhem in which minor (seemingly innocuous) changes could cause catastrophic performance hits. If you're doing similar there are a few things you can do to try and mitigate the problem, but the details are somewhat icky so i'll forgo discussing them here unless it's actually relevant.

olliej
afaik goto will seriously screw with a compiler's ability to optimize, which probably lead to the funny output.
Calyth
Yes, but computed goto is extraordinarily useful for interpreters where you're effectively writing a series goto's anyway (a standard interpreter is while(1) switch(...){pile of code}, or statementNode->execute(..) (where execute is a virtual function).
olliej
@Calyth: Regular gotos don't hurt optimisation -- the compiler has to do essentially the same thing for break/continue and for return statements in functions that get inlined. Computed gotos on the other hand reduce both optimisation opportunities and execution speed on pipelined CPUs.
j_random_hacker
Actually regular goto's can hurt optimisation as it they interfere with higher level behaviour in a way that is not necessarily optimiser friendly (but trivial gotos can easily be converted to more readily optimisable alternatives internally).
olliej
Also, computed goto is not necessarily slower or worse for pipelined cpu's. It depends entirely on the use case. In the case i mentioned above (an interpreter) computed goto is substantially better for the pipeline than the alternatives due to better interaction with the branch predictor.
olliej
olliej: Yes, what you describe is my exactly problem. How did you dealt with it?
Łukasz Lew
@Łukasz: If it's the addition of a specific block of code that is causing problems try adding either "goto foo; foo:" or "goto * foo:" before hand. The computed goto version effectively forces gcc to treat the label as an unavoidable block boundary, so *can* stop it doing the wrong thing.
olliej
+3  A: 

If you really want to mess with the register alloation then you can force GCC to allocate local and global variables in certain registers.

You do this with a special variable declaration like this:

 register int test_integer asm ("EBX");

Works for other architectures as well, just replace EBX with a target specific register name.

For more info on this I suggest you take a look at the gcc documentation:

http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/Local-Reg-Vars.html

My suggestion however is not to mess with the register allocation unless you have very good reasons for it. If you allocate some registers yourself the allocator has less registers to work with and you may end up with a code that is worse than the code you started with.

If your function is that performance critical that you get 20% performance differences between compiles it may be a good idea to write that thing in inline-assembler.


EDIT: As strager pointed out the compiler is not forced to use the register for the variable. It's only forced to use the register if the variable is used at all. E.g. if the variable it does not survive an optimization pass it won't be used. Also the register can be used for other variables as well.

Nils Pipenbrinck
Actually, the syntax says "use this register when messing with test_integer". It does not force EBX to be test_integer. Rather, it forces test_integer to be EBX. (At least from what I gather from the docs.) I think you should make this more clear in your answer.
strager
good link. inline asm is not an option. the hot spot is way too big.
Łukasz Lew
Lukasz, if you don't want to use inline assembler you can also take one compiled object-code that performs well, disassemble it and use the generated asm code.
Nils Pipenbrinck