views:

737

answers:

6

Say I have a function like this:

inline int shift( int what, int bitCount )
{
    return what >> bitCount;
}

It will be called from different sites each time bitCount will be non-negative and within the number of bits in int. I'm particularly concerned about call with bitCount equal to zero - will it work correctly then?

Also is there a chance that a compiler seeing the whole code of the function when compiling its call site will reduce calls with bitCount equal to zero to a no-op?

+3  A: 

It will work correctly on any widely used architecture (I can vouch for x86, PPC, ARM). The compiler will not be able to reduce it to a noop unless the function is inlined.

Crashworks
That would be compiler-specific, I don't believe the standard *requires* the function call to be created. Global optimization could quite easily cull the function call if bitCount were passed in as a constant 0 and possibly even a variable containing zero (although there'd be code to examine the variable at least, so it wouldn't be a no-op). It depends entirely on the compiler.
paxdiablo
Of course, having said that, I've never seen a compiler in the wild do that level of optimization - I suspect the return on investment wouldn't be enough to warrant it.
paxdiablo
Pax, VC9 does "out of the box" (default release settings) for: ---- int Adjust(int x, int shift){ if (shift > 0) { /* large block that prevents inlining */ } return x << shift;} ----Even when calling from a different translation unit, when passing shift=0, the call and the >> will be culled. Quite impressive!
peterchen
I should have been clearer and said "inlined, explicity or implicitly." What you're seeing there is the compiler pushing something to inline even though it's not been marked. If the function isn't inline then by definition it's called by stack convention and can't be elided (unless inlined later in LTCG).
Crashworks
+1  A: 

About the correctness of arg << 0 or arg >> 0, no problem, absolutely fine.

About the eventual optimizations: This will not be reduced to a >nop< when called with a constant what=0 and/or bitcount=0, unless you declare it as inline and choose optimizations (and your compiler of choice understands what inline is).

So, bottom line, optimize this code by conditionally calling the function only if the OR of arguments is non zero (about the fastest way I figure to test that both args are non-zero).

jpinto3912
@jpinto3912, I think question is asking what happens when the shift is zero, not the value (as your answer seems to assume).
paxdiablo
He is not shifting zero, he is shifting something zero times.
anon
@Pax, @Neil: if you're shifting zero, you'll also effectively get a no-op. The question asks about one case, the answer notes both cases.
Stobor
Surely the shift by zero would be faster than a potential branch with an OR?
kibibu
@kibibu, likely, but that's not the scenario. It's pushing args, calling a function, shifting by zero, and pop stack for return. If the situation is often a zero-shift, that is a lot slower than OR and IFNZ prior to calling. Anyway, this is just in case your compiler don't know what's "inline" (or you have a tiny code space req).
jpinto3912
+3  A: 

The compiler could only perform this optimisation do that if it knew at compile time that the bitCount value was zero. That would mean that the passed parameter would have to be a constant:

const int N = 0;
int x = shift( 123, N );

C++ certainly allows such an optimisation to be performed, but I'm not aware of any compilers that do so. The alternative approach the compiler could take:

int x = n == 0 ? 123 : shift( 123, n );

would be a pessimisation in the majority of cases and I can't imagine compiler writer implementing such a thing.

Edit: AA shift of zero bits is guaranteed to have no effect on the thing being shifted.

anon
That's nice, but is shift by zero guaranteed to have the same effect as no-op (not necessarily emit no code)?
sharptooth
C++ makes no guarntees about emitted object code, and has no concept of a no-op. However, a shift of zero bits can have no effect on any C++ objects. Perhaps you could expand in your question why you are asking about this?
anon
Expanded - I will call it many times, sometimes bitCount may be zero. If it's guaranteed to have no effect on the variable will you please add it to the answer?
sharptooth
@Neil: unless of course you override the shift operator...
kibibu
@kibibu You can't override the shift operator (or any other operator) for built-in types.
anon
I wouldn't go with the ternary version. A shift, whether it alters the value or not (and a shift by zero bits does not alter it), is usually a single CPU cycle on most CPUs. The ternary operation has a comparison and at least one (if not two) jumps, additionally to a possible shift. That will definitely always be slower than a simple shift, even if the CPU is very good at branch prediction. In the very best case it could be equally fast, but I rather doubt that.
Mecki
+6  A: 

According to K&R "The result is undefined if the right operand is negative, or greater than or equal to the number of bits in the left expression's type." (A.7.8) Therefore >> 0 is the identity right shift and perfectly legal.

plinth
+8  A: 

It is certain that at least one C++ compiler will recognize the situation (when the 0 is known at compile time) and make it a no-op:

Source

inline int shift( int what, int bitcount)
{
  return what >> bitcount ;
}

int f() {
  return shift(42,0);
}

Compiler switches

icpc -S -O3 -mssse3 -fp-model fast=2 bitsh.C

Intel C++ 11.0 assembly

# -- Begin  _Z1fv
# mark_begin;
       .align    16,0x90
        .globl _Z1fv
_Z1fv:
..B1.1:                         # Preds ..B1.0
        movl      $42, %eax                                     #7.10
        ret                                                     #7.10
        .align    16,0x90
                                # LOE
# mark_end;
        .type   _Z1fv,@function
        .size   _Z1fv,.-_Z1fv
        .data
# -- End  _Z1fv
        .data
        .section .note.GNU-stack, ""
# End

As you can see at ..B1.1, Intel compiles "return shift(42,0)" to "return 42."

Intel 11 also culls the shift for these two variations:

int g() {
  int a = 5;
  int b = 5;
  return shift(42,a-b);
}

int h(int k) {
  return shift(42,k*0);
}

For the case when the shift value is unknowable at compile time ...

int egad(int m, int n) {
  return shift(42,m-n);
}

... the shift cannot be avoided ...

# -- Begin  _Z4egadii
# mark_begin;
       .align    16,0x90
        .globl _Z4egadii
_Z4egadii:
# parameter 1: 4 + %esp
# parameter 2: 8 + %esp
..B1.1:                         # Preds ..B1.0
        movl      4(%esp), %ecx                                 #20.5
        subl      8(%esp), %ecx                                 #21.21
        movl      $42, %eax                                     #21.10
        shrl      %cl, %eax                                     #21.10
        ret                                                     #21.10
        .align    16,0x90
                                # LOE
# mark_end;

... but at least it's inlined so there's no call overhead.

Bonus assembly: volatile is expensive. The source ...

int g() {
  int a = 5;
  volatile int b = 5;
  return shift(42,a-b);
}

... instead of a no-op, compiles to ...

..B3.1:                         # Preds ..B3.0
        pushl     %esi                                          #10.9
        movl      $5, (%esp)                                    #12.18
        movl      (%esp), %ecx                                  #13.21
        negl      %ecx                                          #13.21
        addl      $5, %ecx                                      #13.21
        movl      $42, %eax                                     #13.10
        shrl      %cl, %eax                                     #13.10
        popl      %ecx                                          #13.10
        ret                                                     #13.10
        .align    16,0x90
                                # LOE
# mark_end;

... so if you're working on a machine where values you push on the stack might not be the same when you pop them, well, this missed optimization is likely the least of your troubles.

Thomas L Holaday
+1 for exquisite detail and for using the intel compiler :)
Jamie Cook
+1  A: 

To make the function somewhat self documenting, you may want to change bitCount to unsigned to signify to callers that a negative value is not valid.

Dolphin