+1  A: 

In the embedded world, the "modulus" operations you need to do are often the ones that break down nicely into bit operations that you can do with '&' and '|' and sometimes '>>'.

Paul Tomblin
How would you handle the case where it wasn't a neat power of two number?
JeffV
Jeff V: Interesting fact about math: x * 7 == x + (x * 2) + (x * 4), or x + x >> 1 + x >> 2. Integer addition is usually pretty cheap.
Paul Tomblin
+4  A: 

Unless you really need high performance on multiple embedded platforms, don't change how you code for performance reasons until you profile!

Code that's written awkwardly to optimize for performance is hard to debug and hard to maintain. Write a test case, and profile it on your target. Once you know the actual cost of modulus, then decide if the alternate solution is worth coding.

shmuelp
I agree with you 100%, however, do you have an alternative to modulus in the case where performance matters?
JeffV
I don't have a specific alternative. If I couldn't use a bitmask or right shift like @Paul-Tomblin suggested, I'd keep a counter, just like you suggested in your question. Of course, I'm used to working with more overhead (we're running CORBA on our processors).
shmuelp
A: 

Not that this is necessarily better, but you could have an inner loop which always goes up to FIZZ, and an outer loop which repeats it all some certain number of times. You've then perhaps got to special case the final few steps if MAXCOUNT is not evenly divisible by FIZZ.

That said, I'd suggest doing some research and performance profiling on your intended platforms to get a clear idea of the performance constraints you're under. There may be much more productive places to spend your optimisation effort.

Matt Sheppard
+1  A: 

Do you have access to any programmable hardware on the embedded device? Like counters and such? If so, you might be able to write a hardware based mod unit, instead of using the simulated %. (I did that once in VHDL. Not sure if I still have the code though.)

Mind you, you did say that division was 5-10 times faster. Have you considered doing a division, multiplication, and subtraction to simulated the mod? (Edit: Misunderstood the original post. I did think it was odd that division was faster than mod, they are the same operation.)

In your specific case, though, you are checking for a mod of 6. 6 = 2*3. So you could MAYBE get some small gains if you first checked if the least significant bit was a 0. Something like:

if((!(x & 1)) && (x % 3))
{
    print("Fizz\n");
}

If you do that, though, I'd recommend confirming that you get any gains, yay for profilers. And doing some commenting. I'd feel bad for the next guy who has to look at the code otherwise.

Rob Rolnick
I meant division and modulus is 5-10 times faster on a micro with an integer division op code in the instruction set.
JeffV
Oh, sorry, I misunderstood. (Clearly.) Have you profiled the other suggestion?
Rob Rolnick
+6  A: 

If you are calculating a number mod some power of two, you can use the bit-wise and operator. Just subtract one from the second number. For example:

x % 8 == x & 7
x % 256 == x & 255

A few caveats:

  1. This only works if the second number is a power of two.
  2. It's only equivalent if the modulus is always positive. The C and C++ standards don't specify the sign of the modulus when the first number is negative. A bit-wise and gets rid of the sign bit, so it will always be positive (i.e. it's a true modulus, not a remainder). It sounds like that's what you want anyways though.
  3. Your compiler probably already does this when it can, so in most cases it's not worth doing it manually.
Matthew Crumley
+2  A: 

@Matthew is right. Try this:

int main() {
  int i;
  for(i = 0; i<=1024; i++) {
    if (!(i & 0xFF)) printf("& i = %d\n", i);
    if (!(i % 0x100)) printf("mod i = %d\n", i);
  }
}
hoyhoy
A: 

@Jeff V: I see a problem with it! (Beyond that your original code was looking for a mod 6 and now you are essentially looking for a mod 8). You keep doing an extra +1! Hopefully your compiler optimizes that away, but why not just test start at 2 and go to MAXCOUNT inclusive? Finally, you are returning true every time that (x+1) is NOT divisible by 8. Is that what you want? (I assume it is, but just want to confirm.)

Rob Rolnick
Yes, I see that too. I changed it to 7 because 6 evently divisible by two. I'm looking for an answer that could be used for any number known up front.
JeffV
A: 

You should really check the embedded device you need. All the assembly language I have seen (x86, 68000) implement the modulus using a division.

Actually, the division assembly operation returns the result of the division and the remaining in two different registers.

Vincent Robert
+12  A: 
Adam Davis
Wow, if I ever have to do it for a number other than 7, I'll have to come back and re-read this post.
JeffV
A: 

The print statement will take orders of magnitude longer than even the slowest implementation of the modulus operator. So basically the comment "slow on some systems" should be "slow on all systems".

Also, the two code snippets provided don't do the same thing. In the second one, the line

if(fizzcount >= FIZZ)

is always false so "FIZZ\n" is never printed.

Greg Rogers
+2  A: 

There is an overhead most of the time in using modulo that are not powers of 2. This is regardless of the processor as (AFAIK) even processors with modulus operators are a few cycles slower for divide as opposed to mask operations.

For most cases this is not an optimisation that is worth considering, and certainly not worth calculating your own shortcut operation (especially if it still involves divide or multiply).

However, one rule of thumb is to select array sizes etc. to be powers of 2.

so if calculating day of week, may as well use %7 regardless if setting up a circular buffer of around 100 entries... why not make it 128. You can then write % 128 and most (all) compilers will make this & 0x7F

itj