views:

248

answers:

4

I'm playing through project Euler in my spare time, and it's come to the point where I need to do some refactoring. I've implemented Miller-Rabin, as well as a few sieves. I've heard before that sieves are actually faster for small-ish numbers, as in under a few million. Does anybody have any information on this? Google wasn't very helpful.

+1  A: 

The only way is to benchmark yourself. When you do, write it up, and post it online somewhere.

Rich Bradshaw
Seriously. You've already done the implementations, why not time them yourself? If you're afraid you might have missed the fastest algorithm, post your best as a new question and see if someone can do better.
Mark Ransom
I could do this, but my implementation of a few of these tests are really crappy. I'm pretty sure the way I've written my miller-rabin is pretty bad. Actually I know it's bad. I was wondering for best-case scenarios, so I can just work on the "correct" implementation without refactoring every single one of mine to be "good enough" before testing them.
Josh
Java also has Miller-Rabin in the library, for BigInteger.
starblue
+8  A: 

Yes, you'll find with most algorithms that you can trade space for time. In other words, by allowing the use of more memory, the speed is greatly increased *a.

I don't actually know the Miller-Rabin algorithm but, unless it's simpler than a single shift-left/add and memory extraction, it will be blown out of the water by a pre-calculated sieve.

The important thing here is pre-calculated. It's a good idea, in terms of performance, to pre-calculate things like this since the first million primes will be unlikely to change in the near future :-)

In other words, create your sieve with something like:

int primeTbl[] = {0,0,1,1,0,1,0,1,0,0,0,1};
#define isPrime(x) ((x < sizeof(primeTbl) ? primeTbl[x] : isPrimeFn(x))

with all the usual caveats about not passing things like a++ into macros. This gives you the best of both worlds, a blindingly fast table lookup for "small-ish" primes, dropping back to a calculation method for those outside the range.

Obviously you would write a program using one of the other methods to generate that lookup table - you don't really want to have to type it all in by hand.

But, as with all optimisation questions, measure, don't guess!


*a A classic case of this was some trig functions I once had to write for an embedded system. This was a competitive contract bid and the system had a little more storage than CPU grunt.

We actually won the contract since our benchmark figures for the functions blew the competition away.

Why? Because we pre-calculated the values into a lookup table originally calculated on another machine. By judicious use of reduction (bringing the input values down below 90 degrees) and trig properties (the fact that cosine is just a phase shift of sine and that the other three quadrants are related to the first), we got the lookup table down to 180 entries (one per half degree).

The best solutions are those that are elegant and devious :-)

paxdiablo
+1 for "the first million primes will be unlikely to change in the near future", LOL. I'm not familiar with Project Euler rules, perhaps this isn't allowed?
Mark Ransom
@Mark: There are no formal rules in Project Euler.
You
Yes, if you're willing to blow your whole L1 cache (61 kB) on the table, you can check odds under a million for primality with very fast amortized performance. But for Project Euler you're going to need primes in a much wider range, and performance for larger numbers will dominate the runtime.
Charles
You can also store the differences between consecutive primes starting with 3. This allows 1 byte per prime for a really large range but precludes lookups. So checking for factors is faster, but checking if something is in the list is poor.
phkahler
+1  A: 

As a variant on the notion of pre-computation, you can first cheaply check whether the candidate number p is divisible by 2, 3, 5, 7, or 11. If not, then declare p prime if 2p-1 = 1 (mod p). This will fail at some point, but it works up to 100 million because I tested it (pre-computation).

In other words, all the small-ish Fermat pseudo-primes to the base 2 are divisible by one of 3, 5, 7, or 11.

EDIT:

As correctly noted by @starblue, the above is simply wrong. I had a bug in my program. The best I can do is amend the above to:

If candidate p is divisible by 2, 3, 5, 7, or 11, declare it composite;
Else if p is one of {4181921, 4469471, 5256091, 9006401, 9863461}, declare it composite;
Else if p passes the Miller-Rabin test for bases 2 and 5 then declare it prime;
Else declare it composite.

This I tested for integers less than 10,000,000. Perhaps a different pair of bases would do even better.

Please accept my apologies for my mistakes.

EDIT 2:

Well, it appears that the information I was after is already on the Wikipedia page for the Miller-Rabin algorithm, the section titled "Deterministic variants of the test".

GregS
@Greg, that final test looks slightly weird to me (1 mod p is always 1 for p > 1). I assume you mean `if (2^(p-1) mod p) = 1`, yes?
paxdiablo
by "=", I mean congruent. I should have parenthesized the mod p part. Corrected.
GregS
This seems like very useful information to keep around for a rainy day - what's the first number that it fails on?
caf
I still haven't found one. Maybe I'll run it overnight sometime and see if I hit one.
GregS
Isn't this just Miller-Rabin using 2 as a in a^(n-1)=1 (mod n)? It's interesting that it works up to 100 million if it is, as I maybe incorrectly assumed that Miller-Rabin was probabilistic and needed to be run with multiple different numbers as a to be accurate...
Josh
@Josh: The Miller-Rabin primality test is based on, and is an improvement of, the Fermat primality test. The Miller-Rabin and Fermat test are probabilistic and are normally run multiple times, but this is a special case.
GregS
@caf: It first fails for 1387. There are 1458 exceptions up to 100 million.
Charles
@GregS: It still looks like the base case of Miller-Rabin to me. Why does this work?
Josh
@GregS: Oh duh, I didn't read the part about division by primes.
Josh
-1 29341 = 13 * 37 * 61 is a Carmichael number, where it must fail. http://oeis.org/classic/A002997
starblue
Yep, bug in my program.
GregS
+1: the new test works up to 14,709,241. Bases 2 and 11 are pretty nice; 5 exceptions make it work until 63,388,033. Bases 2 and 733 are slightly better with 74,927,161.
Charles
+4  A: 

I recommend a tiered approach. First, make sure there are no small prime factors. Trial-dividing by the first 20 or 30 primes works, though if you use a clever approach you can reduce the number of divisions needed by using gcds. This step filters out about 90% of the composites.

Next, test if the number is a strong probable prime (Miller-Rabin test) to base 2. This step removes almost all remaining composites, but some rare composites can pass.

The final proving step depends on how large you want to go. If you are willing to work in a small range, do a binary search on a list of 2-pseudoprimes up the the largest you allow. If that's 2^32, your list will have only 10,403 members, so the lookup should take only 14 queries.

If you want to go up to 2^64, it now suffices (thanks to the work of Jan Feitisma) to check if the number is a BPSW pseudoprime. (You could also download the 3 GB list of all exceptions, remove those which trial division would remove, and write a disk-based binary search.) T. R. Nicely has a nice page explaining how to implement this reasonably efficiently.

If you need to go higher, implement the above method and use it as a subroutine for a Pocklington-style test. This stretches the definition of "small-ish"; if you want more information on these methods, just ask.

Charles
+1 nice answer and great links.
Accipitridae