views:

186

answers:

8

If have the following C function, used to determine if one number is a multiple of another to an arbirary tolerance

#include <math.h>

#define TOLERANCE 0.0001

int IsMultipleOf(double x,double mod)
{
    return(fabs(fmod(x, mod)) < TOLERANCE);
}

It works fine, but profiling shows it to be very slow, to the extent that it has become a candidate for optimization. About 75% of the time is spent in modulo and the remaining in fabs. I'm trying to figure a way of speeding things up, using something like a look-up table. The parameter x changes regularly, whereas mod changes infrequently. The number of possible values of x is small enough that the space for a look-up would not be an issue, typically it will be one of a few hundred possible values. I can get rid of the fabs easily enough, but can't figure out a reasonable alternative to the modulo. Any ideas on how to optimize the above?

Edit The code will be running on a wide range of Windows desktop and mobile devices, hence processors could include Intel, AMD on desktop, and ARM or SH4 on mobile devices. VisualStudio 2008 is the compiler.

A: 

Does it need to be double precision ? Depending on how good your math library is, this ought to be faster:

#include <math.h>

#define TOLERANCE 0.0001f

bool IsMultipleOf(float x, float mod)
{
    return(fabsf(fmodf(x, mod)) < TOLERANCE);
}
Paul R
fmodf() is *slower* than fmod().
Hans Passant
@Hans Passant: that will depend on what CPU and math library you are using - in general it *should* be faster.
Paul R
@Hans Passant: I just tested this on a Core i7, compiling with Intel ICC 11, and the single precision version with `fmodf` is around 25% faster. **YMMV** of course.
Paul R
A: 

I presume modulo looks a little like this on the inside:

mod(x,m) {
  while (x > m) {
    x = x - m
  }
  return x
}

I think that through some sort of search i could be optimised: eg:

fastmod(x,m) {
  q = 1

  while (m * q < x) {
    q = q * 2
  }

  return mod((x - (q / 2) * m), m)
}

You might even choose to replace the finall call to mod with annother call to fastmod, adding the condition that if x < m then to return x.

thomasfedb
I wrote all this in C-like pesudocode by the way.
thomasfedb
+3  A: 

Do you really have to use modulo for this?

Wouldn't it be possible to just result = x / mod and then check if the decimal part of result is close to 0. For instance:

11 / 5.4999 = 2.000003  ==> 0.000003 < TOLERANCE 

Or something like that.

Martin Wickman
pretty sure that's exactly what `modulo` or `fmod` does (: only it may do it a *smarter* way depending on the CPU (it may be a single instruction for example to calculate the division and modulus together)
drfrogsplat
@drfrogsplat - My assumption was that fmod and division were going to have similar performance, but this is something I should obviously test.
Shane MacLaughlin
+1  A: 

Division (floating point or not, fmod in your case) is often an operation where the execution time varies a lot depending on the cpu and compiler:

  • gcc has a builtin replacement for
    that if you give it the right compile flags or if you use __builtin_fmod explicitly. This then might map the operation on a small number of assembler instructions.
  • there may be special units like SSE on intel processors where this operation is implemented more efficiently

By such tricks, depending on your environment (you didn't tell which) the time may vary from some clock cycles to some hundred. I think best is to look into the documentation of your compiler and cpu for that particular operation.

Jens Gustedt
SSE would only help if you want to perform several divisions at once (actually I'm not even sure if SSE instructions include division).The second 'S' is for 'SIMD' = Single Instruction Multiple Data, i.e. perform 8 additions at once, so I think this'd only help if one were to be searching for valid 'mod' values by testing a large list of them (i.e. you could test 8 at once, perhaps)...
drfrogsplat
No, this S in there is only for historical reasons. You have some instructions that are faster if you use them just for one operator. But the downside is really that there is not one simple rule for it. You have to dig through manuals, realize different implementations and benchmark the whole to know what is possible.
Jens Gustedt
+1  A: 

The following is probably overkill, and sub-optimal. But for what it is worth here is one way on how to do it.

We know the format of the double ...

  • 1 bit for the sign
  • 11 bits for the biased exponent
  • 52 fraction bits

Let ...

  • value = x / mod;
  • exp = exponent bits of value - BIAS;
  • lsb = least sig bit of value's fraction bits;

Once you have that ...

/*
 * If applying the exponent would eliminate the fraction bits
 * then for double precision resolution it is a multiple.
 * Note: lsb may require some massaging.
 */
if (exp > lsb)
    return (true);

if (exp < 0)
    return (false);

The only case remaining is the tolerance case. Build your double so that you are getting rid of all the digits to the left of the decimal.

  • sign bit is zero (positive)
  • exponent is the BIAS (1023 I think ... look it up to be sure)
  • shift the fraction bits as appropriate

Now compare it against your tolerance.

Sparky
+1  A: 

I think you need to inspect the bowels of your C RTL fmod() function: X86 FPU's have 'FPREM/FPREM1' instructions which computes remainders by repeated subtraction.

While floating point division is a single instruction, it seems you may need to call FPREM repeatedly to get the right answer for modulus, so your RTL may not use it.

Roddy
A: 

I have not tested this at all, but from the way I understand fmod this should be equivalent inlined, which might let the compiler optimize it better, though I would have thought that the compiler's math library (or builtins) would work just as well. (also, I don't even know for sure if this is correct).

#include <math.h>

int IsMultipleOf(double x, double mod) {
    long n = x / mod;  // You should probably test for /0 or NAN result here
    double new_x = mod * n;
    double delta = x - new_x;
    return fabs(delta) < TOLERANCE;  // and for NAN result from fabs
}
nategoose
+1  A: 

Maybe you can get away with long long instead of double if you have comparable scale of data. For example long long would be enough for over 60 astronomical units in micrometer resolution.

Tometzky