views:

195

answers:

2

I need to perform some integer divisions in the hot path of my code. I've already determined via profiling and cycle counting that the integer divisions are costing me. I'm hoping there's something I can do to strength reduce the divisions into something cheaper.

In this path, I am dividing by 2^n+1, where n is variable. Essentially I want to optimize this function to remove the division operator:

unsigned long compute(unsigned long a, unsigned int n)
{
    return a / ((1 << n) + 1);
}

If I were dividing by 2^n, I would just replace the div with a shift-right by n. If I were dividing by a constant, I would let the compiler strength reduce that specific division, likely turning it into a multiply and some shifts.

Is there a similar optimization that applies to 2^n+1?

Edit: a here can be an arbitrary 64-bit integer. n takes only a few values between 10 and, say, 25. I can certainly precompute some values for each n, but not for a.

+12  A: 

Since you can only shift an int so many places, you can put all those cases into a choice of one of several divisions by a constant:

unsigned long compute(unsigned long a, unsigned int n)
{
    // assuming a 32-bit architecture (making this work for 64-bits 
    // is left as an exercise for the reader):
    switch (n) {
        case  0: return a / ((1 << 0) + 1);
        case  1: return a / ((1 << 1) + 1);
        case  2: return a / ((1 << 2) + 1);

            // cases 3 through 30...

        case 31: return a / ((1 << 31) + 1);
    }
}

So now each division is by a constant, which compilers will typically reduce to a series of multiply/shift/add instructions (as the question mentioned). See http://stackoverflow.com/questions/2580680/does-a-c-c-compiler-optimize-constant-divisions-by-power-of-two-value-into-shif/2580985#2580985 for deatils.

Michael Burr
Interesting idea. Perhaps I can write this code, then extract the strength reduction parameters into a table.
J.S.
Worth a try, but you're trading off an unpredictable jump against the difference between shift+division instructions and the equivalent strength-reduced division. Any ideas when that's a good trade-off?
Steve Jessop
+1,makes sense; although you would probably want to benchmark this to confirm that the indirection and/or conditional branches required to implement the `switch()` are actually faster than the integer divide...
David Gelhar
You can even put a default to fall back on general cases.
GMan
@J.S.: it doesn't matter what `a` can be - since `n` is limited to less than the number of bits in an `unsigned int` you have a limited number of divisors.
Michael Burr
I agree that branching might make this not speed things up, so a benchmark comparison is in order. If that doesn't work out, there's another possible solution I'll try to post later (unfortunately, today's moving day...).
Michael Burr
@Paul: I've added a reference to division by constants can be optimized by the compiler.
Michael Burr
@MB, the hackersdelight pdf is enligtening. Those techniques should work nicely here...
Paul
You are doing the `<<` on `int`. This is bad, since on a platform with 32 bit `int` this will lead to undefined behavior. Use `1UL << something` instead of `1 << something`.
Jens Gustedt
@JG, we're told n is 10 to 25 or so, so not an issue here.
Paul
Also, a little help would be writing `(1 << n) + 1` as `(1ul << n) | 1ul` since all the values of the expression end with 1
Ed.C
@Jens: The `<<` operation was in the original specification - it's not undefined unless the `n` value is greater than or equal to the width of the promoted left operand. Since it's in the original function, I assumed that the value of `n` would be appropriate.
Michael Burr
@Ed.C: I'm not sure what help that would be.
Michael Burr
Probably nothing, but it's a rule of thumb to use binary operators whenever possible instead of arithmetic operators.
Ed.C
This switch is almost surely a lot slower than the division on modern hardware. Instead you should make a table of the multiply-shift-add operations the compiler generates for each and then use `n` as an index into this table and apply them yourself.
R..
+6  A: 

You can replace integer division by a constant, by multiplication (modulo wordsize) with a magic number and a shift.

The magic numbers can be pre-calculated for known constants.

Since n can't take many values e.g. 0..31 it is "easy" to pre-calculate these magic numbers for all n and store it in a table with 32 elements.

Javascript Page for calculating the magic numbers

A good compiler can compute the magic numbers and replace integer division by multiplication and shift if the divisor is constant at compile time. Depending on how the rest of the code is structured around the performance critical code you could use macro or inline tricks to unroll for all possible values of n and let the compiler do the work of finding the magic numbers (similar to the answer with the switch, but I would put more code in the constant region otherwise it might be a tradeof not worth it -- branching can cost you performance also)

Detailed description together with code for calculating the magic numbers can be fund in the Book "Hackers Delight" by Henry S. Warren, Jr. (highly recommended must have book!) pp. 180ff.

Link to Google Books for the relevant chapter:

Chapter 10-9 Unsigned Division by Divisors >= 1

Peer Stritzinger