If your C compiler is targeting a CPU with no divide instruction, you can modify your code as follows:
mod(a, b) {
int s = b + b + b + b;
int r = a;
while(r >= s) {
r -= s;
}
while(r >= b) {
r -= b;
}
return r;
}
This works by subtracting the values in chunks of four rather than one, right up until the last one then it switches to subtracting chunks of one.
This should make your code run about four times as fast (assuming 4*b
isn't outside the range of your integers). You could even insert more loops (say an 8*b
one) before the 4*b
one for even more speed.
Other than that, hand-coding assembler may help but I think you'll find quite a boost from the above code without it.
If you know more detail on the way you'll be using the mod call, you can optimize it for your particular cases. For example, if you only want to know modulo 25 of a 16-bit integer, the following code will be much faster than a simplistic loop with variable denominator.
int mod25 (int a) { // a has maximum value of 2^15-1 = 32767
while (a >= 15625) a-= 15625; // at most 2 times.
while (a >= 625) a-= 625; // at most 24 times.
while (a >= 25) a-= 25; // at most 24 times.
return a;
}
Running a test, I find that you have to do 10 million iterations before a noticeable difference appears between that modulo code and the use of the %
operator (2 seconds vs. 0 seconds). Up until that point, they were both 0 seconds, although that was run on a fast machine (better for mod25
) and with a div
instruction (better for %
operator) so you'd need to benchmark it on your own hardware.
This is about as fast as you're likely to get without making your code unreadable (although even that shouldn't stop you if you're willing to add lots of comments explaining how it works).
A more general solution for any denominator is to first double the denominator (with bit shifts for speed) as far as possible so that the ensuing subtractions are minimized. Then, as the numerator reduces below the increased denominator, halve the denominator and keep going (until the denominator is back at the start).
int mod (int n, int d) {
/* dx is the adjusted denom, don't let it overflow though. */
int dx = d;
while (((dx << 1) >>1) == dx)
dx <<= 1;
/* This loop processes the dx values until they get too small. */
while (dx >= d) {
/* This loop subtracts the large dx value. */
while (n >= dx)
n -= dx;
dx >>= 1;
}
return n;
}
This actually performs on par with the optimized version of mod25
above while providing a more general solution.