views:

1317

answers:

10

Shifting bits left and right is apparently faster than multiplication and division operations on most (all?) CPUs if you happen to be using a power of 2. However, it can reduce the clarity of code for some readers and some algorithms. Is bit-shifting really necessary for performance, or can I expect the compiler/VM to notice the case and optimize it (in particular, when the power-of-2 is a literal)? I am mainly interested in the Java and .NET behavior but welcome insights into other language implementations as well.

+14  A: 

You can almost certainly depend on the literal-power-of-two multiplication optimisation to a shift operation. This is one of the first optimisations that students of compiler construction will learn. :)

However, I don't think there's any guarantee for this. Your source code should reflect your intent, rather than trying to tell the optimiser what to do. If you're making a quantity larger, use multiplication. If you're moving a bit field from one place to another (think RGB colour manipulation), use a shift operation. Either way, your source code will reflect what you are actually doing.

Greg Hewgill
What if you're really interested in performance disregarding source code clearness. For example in a hash function... Will it help in .NET or Java?
Jorge Córdoba
It's possible that such micro-optimisation may help for a *particular* JIT compiler. However, your code will be longer lived than the version of the JIT compiler on which it runs, and it's impossible to tell what optimisations the next version of the JIT compiler will perform. It's even possible that something you did to make it faster in the previous version has *worse* performance in the next version! This situation is very different from languages like C where the compile step is performed only once. JIT-compiled bytecode languages can increase in performance long after the code is written.
Greg Hewgill
A: 

If the compiler (compile-time constant) or JIT (runtime constant) knows that the divisor or multiplicand is a power of two and integer arithmetic is being performed, it will convert it to a shift for you.

280Z28
+2  A: 

I am stunned as i just wrote this code and realized that shifting by one is actually slower than multiplying by 2!

(EDIT: changed the code to stop overflowing after mmyers suggestion but the results are the same! what is wrong here?)

import java.util.Date;


public class Test {
    public static void main(String[] args) {
     Date before = new Date();
     for (int j = 1 ;j < 50000000 ; j++) {
      int a = 1 ;
      for (int i = 0 ;i< 10;i++){
       a *=2;
      }
     }
     Date after = new Date();
     System.out.println("Multiplying "+(after.getTime()-before.getTime())+" milliseconds");
     before = new Date();
     for (int j = 1 ;j < 50000000 ; j++) {
      int a = 1 ;
      for (int i = 0 ;i< 10;i++){
       a = a << 1;
      }
     }
     after = new Date();
     System.out.println("Shifting "+(after.getTime()-before.getTime())+" milliseconds");
    }
}

The results are:

Multiplying 639 milliseconds
Shifting 718 milliseconds

Savvas Dalkitsis
Can you benchmark it so that it doesn't overflow? If I remember right, overflowing could be expensive.
Michael Myers
That's virtually identical for a single benchmark on a machine not specifically prepared for it.
Joel Coehoorn
Have you tried running multiple times to see if the results are kept?
luiscubal
i tried it many times... It makes no sense...
Savvas Dalkitsis
can you try it on your machines?
Savvas Dalkitsis
FYI, most of time: Multiplying 1219 milliseconds / Shifting 1219 milliseconds. Sometimes +/- 20 ms, but normal deviation
MicSim
weird...can more people test this and post their results? I am REALLY curious as to why this is happening.
Savvas Dalkitsis
Makes sense enough: The path is Java->ByteCode->i586/CISC->i586/RISCAll those steps will be better optimized for the common * than the rare <<
Henk Holterman
Continued: and the actual << or * will be executed in parallel with the for-loop j++/branch code, which will probably dominate.
Henk Holterman
This may be naive--my knowledge of architecture is archaic at this point, but wouldn't multiply go to the math co-processor where it is crunched up out of band while the CPU starts on the next operation, whereas the >> is probably done directly in the ALU which takes CPU time?
Bill K
+3  A: 

I would ask "what are you doing that it would matter?". First design your code for readability and maintainability. The likelyhood that doing bit shifting verses standard multiplication will make a performance difference is EXTREMELY small.

Chris Brandsma
A: 

Most compilers will turn multiplication and division into bit shifts when appropriate. It is one of the easiest optimizations to do. So, you should do what is more easily readable and appropriate for the given task.

Polaris878
+21  A: 

Almost any environment worth its salt will optimize this away for you. And if it doesn't, you've got bigger fish to fry. Seriously, do not waste one more second thinking about this. You will know when you have performance problems. And after you run a profiler, you will know what is causing it, and it should be fairly clear how to fix it.

You will never hear anyone say "my application was too slow, then I started randomly replacing x * 2 with x << 1 and everything was fixed!" Performance problems are generally solved by finding a way to do an order of magnitude less work, not by finding a way to do the same work 1% faster.

Jason Creighton
+1 for "And after you run a profiler..."
Grant Wagner
Actually I am translating (and trying to understand / make clear) an audio processing algorithm from C to managed code, and the original is riddled with N4=N2>>2, etc. I started to change them to multiplies and divides, but figured it was worth asking if I will be shooting myself in the foot.
James M.
Use whichever is clearer. If it's a constant factor in multiplication or divide, the compiler will almost certainly do the right thing.
Jason Creighton
+9  A: 

Humans are wrong in these cases.

99% when they try to second guess a modern(and all future) compilers.
99.9% when they try to second guess modern(and all future) JITs at the same time.
99.999% when they try to second guess modern(and all future) CPU optimizations.

Program in a way that accurately describes what you want to accomplish, not how to do it. Future versions of the JIT, VM, Compiler, and CPU can all be independantly improved and optimized. If you specify something so tiny and specific, you lose the benefit of all future optimizations.

William Crim
... and 83.7% of statistics are made up :-)
Stephen C
+1 for thinking about the future
Trampas Kirk
+12  A: 

Most compilers today will do more than convert multiply or divide by a power-of-two to shift operations. When optimizing, many compilers can optimize a multiply or divide with a compile time constant even if it's not a power of 2. Often a multiply or divide can be decomposed to a series of shifts and adds, and if that series of operations will be faster than the maultiply or divide, the compiler will use it.

For division by a constant, the compiler can often convert the operation to a multiply by a 'magic number' followed by a shift. This can be a major clock-cycle saver since multiplication is often much faster than a division operation.

Henry Warren's book, Hacker's Delight, has a wealth of information on this topic, which is also covered quite well on the companion website:

See also a discussion (with a link or two ) here:

Anyway, all this boils down to allowing the compiler to take care of the tedious details of micro-optimizations. It's been years since doing your own shifts outsmarted the compiler.

Michael Burr
I accepted this as I think it was very underrated. The other answers were helpful as well. Thanks.
James M.
+1  A: 

On computers I tested, integer divisions are 4 to 10 times slower than other operations.

When compilers may replace divisions by multiples of 2 and make you see no difference, divisions by not multiples of 2 are significantly slower.

For example, I have a (graphics) program with many many many divisions by 255. Actually my computation is :

r = (((top.R - bottom.R) * alpha + (bottom.R * 255)) * 0x8081) >> 23;

I can ensure that it is a lot faster than my previous computation :

r = ((top.R - bottom.R) * alpha + (bottom.R * 255)) / 255;

so no, compilers cannot do all the tricks of optimization.

+1  A: 

Note that shifting down and division will (In Java, certainly) give different results for negative, odd numbers.

int a = -7;
System.out.println("Shift: "+(a >> 1));
System.out.println("Div:   "+(a / 2));

Prints:

Shift: -4
Div:   -3

Since Java doesn't have any unsigned numbers it's not really possible for a java compiler to optimise this.

izb