views:

479

answers:

6

I have minimize cost of calculating modulus in C. say I have a number x and n is the number which will divide x

when n == 65536 (which happens to be 2^16):

mod = x % n (11 assembly instructions as produced by GCC) or
mod = x & 0xffff which is equal to mod = x & 65535 (4 assembly instructions)

so, GCC doesn't optimize it to this extent.

In my case n is not x^(int) but is largest prime less than 2^16 which is 65521

as I showed for n == 2^16, bit-wise operations can optimize the computation. What bit-wise operations can I preform when n == 65521 to calculate modulus.

+2  A: 

idiv — Integer Division

The idiv instruction divides the contents of the 64 bit integer EDX:EAX (constructed by viewing EDX as the most significant four bytes and EAX as the least significant four bytes) by the specified operand value. The quotient result of the division is stored into EAX, while the remainder is placed in EDX.

source: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html

Krystian
`idiv` or a similar opcode (depending on the CPU) might be the best option if the denominator is a variable whose value isn't known until runtime, but if the denominator is a known constant there are optimizations that can be performed that may run faster than `idiv`. However, today's compilers are aware of those optimizations (as well as how to use the `div` opcode to get a remainder), so there's generally nothing special that needs to be done in a C program to take advantage of them.
Michael Burr
+1  A: 

The bitwise operation only works well if the divisor is of the form 2^n. In the general case, there is no such bit-wise operation.

Danvil
+3  A: 

x mod 65536 is only equivalent to x & 0xffff if x is unsigned - for signed x, it gives the wrong result for negative numbers. For unsigned x, gcc does indeed optimise x % 65536 to a bitwise and with 65535 (even on -O0, in my tests).

Because 65521 is not a power of 2, x mod 65521 can't be calculated so simply. gcc 4.3.2 on -O3 calculates it using x - (x / 65521) * 65521; the integer division by a constant is done using integer multiplication by a related constant.

caf
-1. Using the processor's integer divide function is probably not optimal. On my machine (Intel Core 2 Duo, running in 64-bit mode), a simple C test program with gcc 4.4 and -O3 turns x % 65521 into two multiplications, two shifts and two subtractions. The way to find out for sure would be to do some timings, of course. :)
Mark Dickinson
@Mark Dickinson: Quite so, updated.
caf
A: 

Least cost implementation of Modulus in C


How about implementing MOD as follows:

To find: y = X mod n

y = X-(X/n)*n

(Assuming both X and n are integers)

NOTE: For assembly level optimisation, use iDiv as explained above by Krystian.

CVS-2600Hertz
Why would this be fast? It includes division *and* multiplication.
GregS
@GregS: **if** *n* is *const* **and** the compiler does not optimize *modulus* **and** it optimizes division by a constant **and** it optimizes multiplication by a constant, it can be faster. :) (It might be an edge case, but as far as I remember the above was handy in e.g. the .NET 64 bit JIT. Different flavors of compilers might treat the optimization of modulus in different ways. The above may make sense if it optimizes division and multiplication by a constant, but does not optimize mod by a constant.)
andras
+10  A: 

First, make sure you're looking at optimized code before drawing conclusion about what GCC is producing (and make sure this particular expression really needs to be optimized). Finally - don't count instructions to draw your conclusions; it may be that an 11 instruction sequence might be expected to perform better than a shorter sequence that includes a div instruction.

Also, you can't conclude that because x mod 65536 can be calculated with a simple bit mask that any mod operation can be implemented that way. Consider how easy dividing by 10 in decimal is as opposed to dividing by an arbitrary number.

With all that out of the way, you may be able to use some of the 'magic number' techniques from Henry Warren's Hacker's Delight book:

There's an added chapter on the website that contains "two methods of computing the remainder of division without computing the quotient!", which you may find of some use. The 1st technique applies only to a limited set of divisors, so it won't work for your particular instance. I haven't actually read the online chapter, so I don't know exactly how applicable the other technique might be for you.

Michael Burr
+1. Hacker's Delight and the attendant website are excellent resources. Note that it's sometimes possible to do better than the compiler if you have some extra information that the compiler lacks: for example, if you know (from context or from algorithm analysis) an upper bound on the possible size of the dividend.
Mark Dickinson
+3  A: 

rIf you don't have to fully reduce your integers modulo 65521, then you can use the fact that 65521 is close to 2**16. I.e. if x is an unsigned int you want to reduce then you can do the following:

unsigned int low = x &0xffff;
unsigned int hi = (x >> 16);
x = low + 15 * hi;

This uses that 2**16 % 65521 == 15. Note that this is not a full reduction. I.e. starting with a 32-bit input, you only are guaranteed that the result is at most 20 bits and that it is of course congruent to the input modulo 65521.

This trick can be used in applications where there are many operations that have to be reduced modulo the same constant, and where intermediary results do not have to be the smallest element in its residue class.

E.g. one application is the implementation of Adler-32, which uses the modulus 65521. This hash function does a lot of operations modulo 65521. To implement it efficiently one would only do modular reductions after a carefully computed number of additions. A reduction shown as above is enough and only the computation of the hash will need a full modulo operation.

Accipitridae