views:

548

answers:

12

I'm in a need to optimize this really tiny, but pesky function.

unsigned umod(int a, unsigned b)
{
    while(a < 0)
        a += b;

    return a % b;
}

Before you cry out "You don't need to optimize it", please keep in mind that this function is called 50% of the entire program lifetime, as it's being called 21495808 times for the smallest test case benchmark.

The function is already being inlined by the compiler so please don't offer to add the inline keyword.

+13  A: 

This avoids looping:

int tmp = a % b;
if (tmp < 0) tmp += b;

Notice that both a and b need to be signed.

Tronic
+1, the result of x = a % b, must be |x|<b.
Ninefingers
`int tmp = a % b; return tmp + b * (tmp < 0);` Is faster.
LiraNuna
There should be no difference when you use an optimizing compiler (but sometimes compilers miss things).
Tronic
I'm testing this live with GCC 4.4.1 with -O3, there IS a difference. Don't give compilers too much credit, else I won't be here asking this question :)
LiraNuna
@LiraNura: What processor? Can you give an assembly dump?
Anon.
you might even try `return a % b + b * (a < 0);` since b is guaranteed to be unsigned (untested and unprofiled)
cobbal
This doesn't give the same result as the OP's original. Eg with `a = -10` and `b = 3`, the original gives 2 but this gives 0.
caf
(-10) % 3 gives -1, add 3 and you have 2. How did you get 0 instead?
Tronic
Using multiplication might be slower than a (single!) conditional statement. It depends on the processor. I've been developing for the arm7TDMI lately - conditional statements (Which can be done without branching) use 1 extra clock cycle each, where as multiplication uses 3.
Wallacoloo
`(-10) % 3U` does not give -1.
caf
Good point, you obviously need to use signed modulus there to avoid wrapping around. Added a note about that.
Tronic
updated version `umod(-10, 3) == 2`: `return a % (int)b + b * (a < 0);` although this will lose half of the range of b.
cobbal
+1 for mentioning that this will lose half range of b, which will defy the purpose of declaring b _unsigned_.
legends2k
Some anonymous coward downvoting without comment :(
Tronic
+1  A: 

In your original function, you could have returned after the while loop finished for negative numbers, thus skipping the mod. This is in the same spirit, replacing the loop with a multiply - although it could be made to have fewer characters...

unsigned int umod2(int a, unsigned int b)
{
    return (a < 0) ? a + ((-a/b)+1)*b : a % b;
}

Here's the loop version:

unsigned int umod2_works(int a, unsigned int b)
{
    if (a < 0)
    {
        while (a < 0)
            a += b;
        return a;
    } else {
        return a % b;
    }
}

Both have been tested to match the OP's original function.

mtrw
Sorry, I meant to say 'replacing the loop with a *divide and* multiply.' I'm not sure which is faster, but this is at least correct.
mtrw
Doesn't make it go faster, you're adding another branch, thus slowing the routine down. I got even distribution of numbers as input.
LiraNuna
I'm surprised that another branch is slower than a mod. Anyway, Alok's answer looks like the nicest so far.
mtrw
It's not another branch, if it's written properly.
Stephen Canon
A: 

If a and b are both much smaller than an int, then you can just add a sufficiently large multiple of b to every value before you mod.

unsigned umod(int a, unsigned b)
{
    return (unsigned)(a + (int)(b * 256)) % b;
}

Of course, this trick doesn't work if a + (b * 256) can overflow, but for a lot of the uses I can see for this code, you can be certain that it never will.

John Knoeller
+1  A: 

In a % b, if any of the operands is unsigned both are converted to unsigned. This means that if a is negative, you get a modulo UINT_MAX + 1 value instead of a. If UINT_MAX+1 is evenly divisible by b, then things are fine, and you can just return a % b. If not, you have do do the modulo in int type.

unsigned int umod(int a, unsigned int b)
{
    int ret;
    if (a >= 0) return a % b;
    if (b > INT_MAX) return a + b;
    ret = a % (int)b;
    if (ret < 0) ret += b;
    return ret;
}

Edit: Updated, but you should use caf's answer as that's simpler (or maybe not?!). This is here for the record.

Alok
What if `b` is outside of the range of `int`?
caf
Good point. In that case you want `a + b` I think.
Alok
+9  A: 

This should do it:

unsigned umod(int a, unsigned b)
{
    if (a < 0)
    {
        unsigned r = (-a % b);
        if (r)
            return b - r;
        else
            return 0;
    }
    else
        return a % b;
}

Tested to match original. Limitation is that a > INT_MIN on 2s complement machines.

caf
So obvious. +1.
Alok
How can `a` be < `INT_MIN`?
Alok
that's a greater than sign...
Wallacoloo
+1. Good answer even if somewhat cryptic :)
Tronic
@wallacoloo: He changed it from `>= INT_MIN` to `> INT_MIN`. Now it's correct.
Alok
Dammit, missed an edge case - updated to fix but it's nowhere near as clean anymore.
caf
That's pretty weird - since for the `a >= 0` case it should be executing exactly the same instructions at least. How are you getting gprof to profile the inline functions, btw?
caf
FWIW, in my testing (gcc 4.3.2, -O3) my version runs about 6 to 7 times faster than the loop version, with an even spread of `a` values between -1073741823 and 1073741824. With the `a` values all positive, it runs the same.
caf
In my case, it's okay for `0 == b`, so I have changed this to be `return b - (-a % b);` for one less branch.
LiraNuna
+6  A: 

Using the ~ :)

unsigned umod(int a, unsigned b)
{
    if (a<0) return b-1-~a%b;
    return a%b;
}

The % has higher precedence than -

If it's ok to return b instead of 0 when -a is a multiple of b you can save some ops

unsigned umod(int a, unsigned b)
{
    if (a<0) return b - (-a % b);
    return a%b;
}

slightly golfed version :)

unsigned umod(int a, unsigned b)
{
return(a<0)?b-(-a%b):a%b;
}

Here is the resulting assembly

1    .globl umod3
2       .type   umod3, @function
3    umod3:
4    .LFB3:
5       .cfi_startproc
6       testl   %edi, %edi
7       js      .L18
8       movl    %edi, %eax
9       xorl    %edx, %edx
10      divl    %esi
11      movl    %edx, %eax
12      ret
13      .p2align 4,,10
14      .p2align 3
15   .L18:
16      movl    %edi, %eax
17      xorl    %edx, %edx
18      negl    %eax
19      divl    %esi
20      subl    %edx, %esi
21      movl    %esi, %edx
22      movl    %edx, %eax
23      ret
gnibbler
Hm you seem very pegged on this golf think.. get it.. pegged.. ;)
Filip Ekberg
This has less ops than caf's answer, especially if the negataion of `a` is a simple -a ;) (resulting in `return b - (-a % b);`)
LiraNuna
Yeah that was my original answer too, before I added the extra conditional to make it precisely match the behaviour of the code in the question. By the way, it is interesting that `gcc` creates an apparently useless `movl` instruction at the end of the function - I was seeing that too (it could just do `movl %esi, %eax`). I wonder if there is some subtle architectural reason for that?
caf
@caf, apparently the divl puts the result of the mod into edx (unless it is zero). The calling convention requires it to be shifted to eax
gnibbler
It does, but I am talking about the last two mnemonics before `ret` (lines `21` and `22`). Those lines move the final result from `%esi` to `%edx`, then from `%edx` to `%eax` - that could be done with just one move from `%esi` to `%eax`
caf
A: 

Other than the while loop, Not sure whether the % operation can be optimized as a whole, but the optimization can happen on the pattern of the values for a & b.

If in those 21495808 times the operation is executed.

If the chances of passing a value for a which is less than b ( a < b ) is atleast half of the that. Adding the following statement will definetly improve the overall performance of the function.

if ( abs(a) < b ) // not specifically the abs function, can be your own implementation.
    return 0;
else
    return a%b;

If b is a power of 2 for atleast 80% of the cases, we can use bitwise operators as in

return ( abs(a) & (b-1) );

If the numbers are expected to be anything less than that, it would degrade the performance, as we need to verify whether b is power of 2 [ even after using bitwise operators for the same ] for everything.

Even the functionality to achieve abs(a) can be optimized using bitwise operators, with their own limitations, but is faster than verifying whether a is < 0.

n = (a ^ (a >> 31)) - (a >> 31); // instead of n = a < 0 ? -a : a;

There would be more such things, if you can explore.

Narendra N
% is cheap compared to the while loop which is non-deterministic
Clifford
@Clifford, you are right, Did I say anything converse of that?
Narendra N
Not really, but the first sentence makes it look like you are targetting the % for optimisation rather than the loop.
Clifford
By the time I saw this one, there were other answers handling that loop part and no answer was accepted, hence I thought I will take a look at the % part. I will change the answer mentioning it.
Narendra N
+1  A: 
int temp;

temp= (a > 0)? ( a % b ) :   b -( (-a) % b ) ;

code below:

int main()
{
int a;
unsigned b;
int temp;
printf("please enter an int and a unsigned number\n");
scanf("%d",&a);
scanf("%u",&b);
modulus(a,b);
temp= (a > 0)? ( a % b ) :   b -( (-a) % b ) ;
printf("\n temp is %d", temp);
return 0;
}
void modulus(int x,unsigned y)
{
int c;
if(x>0)
{
c=x%y;
printf("\n%d\n",c);}
else
{
while(x<0)
x+=y;
printf("\n%d\n",x);}
}


./a.out
please enter an int and a unsigned number
-8 3

1

 temp is 1
Vijay Sarathi
+1 for good manners
gnibbler
I did not understand...could you please tell the good manners i showed over here?
Vijay Sarathi
The "please" in printfs
Nicolás
+2  A: 

Portable edition, still with only one division, no branching, and no multiplication:

unsigned umod(int a, unsigned b) {
    int rem = a % (int) b;
    return rem + (-(rem < 0) & b);
}
Chris Jester-Young
How is this better than doing a divide and a conditional in C code? Also, might this need clobber args? And maybe `__asm__ __volatile__`?
asveikau
@asveikau: No, no clobbering because of the constraints. Also, no volatile needed, as the compiler's ordering will guarantee the right results. You can do conditional in C code, if you can ensure no branching. (My portable version does that.)
Chris Jester-Young
I was thinking you might need to clobber `cc` because of the `FLAGS` register. This has bit me sometimes while doing atomic ops with inline asm. At any rate I like the C version better.
asveikau
surely you can #ifdef the asm version though
gnibbler
@gnibbler: You can, but why would you? The C version generates code (under gcc) that's even better than the assembly version I wrote.
Chris Jester-Young
oh +1 then!!!!!111
gnibbler
+4  A: 

Since the looping version seems to be quite fast, lets try eliminating the division :)

unsigned umod(int a, unsigned b){
    while(a>0)a-=b;
    while(a<0)a+=b;
    return a;
}
gnibbler
That's a nice optimization for the case when `a` and `b` don't differ much. For the case when, say, `a = MAX_INT`, `b = 2` the code would be terribly slow.Anyway, +1 for non-standard approach.
Vlad
@Vlad, Since LiraNuna says the original looping version does quite well in the real world tests, I wondered if this might be worth trying also, since we don't know the distribution of a and b
gnibbler
simply because you don't understand division?
GregS
@GregS, are you serious?
gnibbler
+1  A: 

Here's one that works over the whole range of unsigned without branching, but it uses multiplications and 2 divisions

unsigned umod(int a, unsigned b)
{
    return (a>0)*a%b+(a<0)*(b-1-~a%b);
}
gnibbler
A: 

My preferred solution is to mod twice. I haven't tried this in C/C++ or with unsigned, but my test cases work in Java:

((a % b) + b) % b

The benefit is no branching and simplicity. The downside is the double mod. I haven't compared performance, but my understanding is that it is branching that hurts performance these days.

JodaStephen
I'm surprised no one else came out with it... I will have to test it out!
LiraNuna