views:

721

answers:

10

I've been interested in the problem of finding a better prime number recognizer for years. I realize this is a huge area of academic research and study - my interest in this is really just for fun. Here was my first attempt at a possible solution, in C (below).

My question is, can you suggest an improvement (without citing some other reference on the net, I'm looking for actual C code)? What I'm trying to get from this is a better understanding of determining performance complexity of a solution like this.

Am I right in concluding that the complexity of this solution is O(n^2)?

#include <stdio.h>
#include <math.h>

/* isprime                                                           */
/* Test if each number in the list from stdin is prime.              */
/* Output will only print the prime numbers in the list.             */

int main(int argc, char* argv[]) {

    int returnValue = 0;
    int i;
    int ceiling;
    int input = 0;
    int factorFound = 0;

    while (scanf("%d", &input) != EOF) {

        ceiling = (int)sqrt(input);
        if (input == 1) {
            factorFound = 1;
        }

        for (i = 2; i <= ceiling; i++) {
            if (input % i == 0) {
                factorFound = 1;
            } 
        }

        if (factorFound == 0) {
            printf("%d\n", input);
        }

        factorFound = 0;    
    } 

    return returnValue;
}
A: 

There is no way to improve the algorithm. There may be tiny ways to improve your code, but not the base speed (and complexity) of the algorithm.

EDIT: Of course, since he doesn't need to know all the factors, just whether or not it is a prime number. Great spot.

Alexander Rafferty
This is false. There exist O((log n)^12) algorithms for primality testing. http://en.wikipedia.org/wiki/Primality_test#Fast_deterministic_tests Some of those algorithms may not be practical in practice, but there are certainly better than O(n^2) algorithms out there.
bdonlan
There is no way to improve THIS algorithm, but he could replace the entire algorithm. So the answer is valid.
John Gaughan
+10  A: 
   for (i = 2; i <= ceiling; i++) {
        if (input % i == 0) {
            factorFound = 1;
            break;
        } 
    }

This is the first improvement to make and still stay within the bounds of "same" algorithm. It doesn't require any math at all to see this one.

Beyond that, once you see that input is not divisible by 2, there is no need to check for 4, 6, 8, etc. If any even number divided into input, then surely 2 would have because it divides all even numbers.

If you want to step outside of the algorithm a little bit, you could use a loop like the one that Sheldon L. Cooper provides in his answer. (This is just easier than having him correct my code from the comments though his efforts are much appreciated)

this takes advantage of the fact that every prime other than 2 and 3 is of the form n*6 + 1 or n*6 - 1 for some positive integer n. To see this, just note that if m = n*6 + 2 or m = n*6 + 4, then n is divisible by 2. if m = n*6 + 3 then it is divisible by 3.

In fact, we can take this further. If p1, p2, .., pk are the first k primes, then all of the integers that are coprime to their product mark out 'slots' that all remaining primes must fit into.

to see this, just let k# be the product of all primes up to pk. then if gcd(k#, m) = g, g divides n*k# + m and so this sum is trivially composite if g != 1. so if you wanted to iterate in terms of 5# = 30, then your coprime integers are 1, 7, 11, 13, 17, 19, 23 and 29.


technically, I didn't prove my last claim. It's not much more difficult

if g = gcd(k#, m), then for any integer, n, g divides n*k# + m because it divides k# so it must also divide n*k#. But it divides m as well so it must divide the sum. Above I only proved it for n = 1. my bad.


Also, I should note that none of this is changing the fundamental complexity of the algoritm, it will still be O(n^1/2). All it is doing is drastically reducing the coefficient that gets used to calculate actual expected run times.

aaronasterling
Good post overall, but it contains a few errors. Your second snippet of code is broken. Also, you wrote `n*5 - 1` instead of `n*6 - 1`. And `m = n*6 + 3` is always divisible by 3, not 6.
Sheldon L. Cooper
@Sheldon. Thanks. All embarrassing mistakes and I'm glad to have them fixed.
aaronasterling
@AaronMcSmooth, your code is still incorrect. It should be `input % (i - 1) == 0` not `(input - 1)`. And the condition of the while loop should be `(i-1) <= ceiling`.
Sheldon L. Cooper
Maybe I'm being pedantic, but now the first snippet of code is wrong. I was referring to the while loop in my previous comment. This code will fail with input = 2. The condition should be just `i <= ceiling` in this case.
Sheldon L. Cooper
@Sheldon I know what you meant. My edit-fu is seriously lacking t oday.
aaronasterling
+2  A: 

A simple improvement would be to change the for loop to break out when it finds a factor:

   for (i = 2; i <= ceiling && !factorFound; i++) {
        if (input % i == 0) {
            factorFound = 1;

Another possibility would be to increment the counter by 2 (after checking 2 itself).

Mark Wilkins
A: 

Even numbers(except 2) cannot be prime numbers. So, once we know that the number is not even, we can just check if odd numbers are it's factors.

for (i = 3; i <= ceiling; i += 2) {
        if (input % i == 0) {
            factorFound = 1;
            break;
        } 
    }
letronje
-1 "even numbers cannot be prime numbers" is a false statement. There is exactly one even prime number, namely 2.
Andreas Rejbrand
@Andreas: fixed it
letronje
And -1 removed! :)
Andreas Rejbrand
+4  A: 

Your code there only has complexity O(sqrt(n)lg(n)). If you assume basic mathematical operations are O(1) (true until you start using bignums), then it's just O(sqrt(n)).

Note that primality testing can be performed in faster-than-O(sqrt(n)lg(n)) time. This site has a number of implementations of the AKS primality test, which has been proven to operate in O((log n)^12) time.

There are also some very, very fast probalistic tests - while fast, they sometimes give an incorrect result. For example, the Fermat primality test:

Given a number p we want to test for primality, pick a random number a, and test whether a^(p-1) mod p = 1. If false, p is definitely not prime. If true, p is probably prime. By repeating the test with different random values of a, the probability of a false positive can be reduced.

Note that this specific test has some flaws to it - see the Wikipedia page for details, and other probabilistic primality tests you can use.

If you want to stick with the current approach, there are a number of minor improvements which can still be made - as others have pointed out, after 2, all further primes are odd, so you can skip two potential factors at a time in the loop. You can also break out immediately when you find a factor. However, this doesn't change the asymptotic worst-case behavior of your algorithm, which remains at O(sqrt(n)lg(n)) - it just changes the best case (to O(lg(n))), and reduces the constant factor by roughly one-half.

bdonlan
+2  A: 

can you suggest an improvement

Here you go ... not for the algorithm, but for the program itself :)

  • If you aren't going to use argc and argv, get rid of them
  • What if I input "fortytwo"? Compare scanf() == 1, not != EOF
  • No need to cast the value of sqrt()
  • returnValue is not needed, you can return a constant: return 0;
  • Instead of having all of the functionality inside the main() function, separate your program in as many functions as you can think of.
pmg
A: 

You can make small cuts to your algorithm without adding too much code complexity. For example, you can skip the even numbers on your verification, and stop the search as soon as you find a factor.

if (input < 2 || (input != 2 && input % 2 == 0))
  factorFound = 1;

if (input > 3)
  for (i = 3; i <= ceiling && !factorFound; i += 2)
    if (input % i == 0)
      factorFound = 1;

Regarding the complexity, if n is your input number, wouldn't the complexity be O(sqrt(n)), as you are doing roughly at most sqrt(n) divisions and comparisons?

Hugo Peixoto
A: 

The time complexity of your program is O(n*m^0.5). With n the number of primes in the input. And m the size of the biggest prime in the input, or MAX_INT if you prefer. So complexity could also be written as O(n) with n the number of primes to check.

With Big-O, n is (usually) the size of the input, in your case that would be the number of primes to check. If I make this list twice as big (for example duplicating it), it would take (+-) exactly twice as long, thus O(n).

Ishtar
+6  A: 

The time complexity for each primality test in your algorithm is O(sqrt(n)).

You can always use the fact that all primes except 2 and 3 are of the form: 6*k+1 or 6*k-1. For example:

int is_prime(int n) {
  if (n <= 1) return 0;
  if (n == 2 || n == 3) return 1;
  if (n % 2 == 0 || n % 3 == 0) return 0;
  int k;
  for (k = 6; (k-1)*(k-1) <= n; k += 6) {
    if (n % (k-1) == 0 || n % (k+1) == 0) return 0;
  }
  return 1;
}

This optimization does not improve the asymptotic complexity.

EDIT

Given that in your code you are testing numbers repeatedly, you might want to pre-calculate a list of primes. There are only 4792 primes less than or equal to the square root of INT_MAX (assuming 32 bit ints).

Furthermore, if the input numbers are relatively small you can try calculating a sieve.

Here's a combination of both ideas:

#define UPPER_BOUND 46340  /* floor(sqrt(INT_MAX)) */
#define PRIME_COUNT 4792  /* number of primes <= UPPER_BOUND */

int prime[PRIME_COUNT];
int is_prime_aux[UPPER_BOUND];

void fill_primes() {
  int p, m, c = 0;
  for (p = 2; p < UPPER_BOUND; p++)
    is_prime_aux[p] = 1;
  for (p = 2; p < UPPER_BOUND; p++) {
    if (is_prime_aux[p]) {
      prime[c++] = p;
      for (m = p*p; m < UPPER_BOUND; m += p)
        is_prime_aux[m] = 0;
    }
  }
}

int is_prime(int n) {
  if (n <= 1) return 0;
  if (n < UPPER_BOUND) return is_prime_aux[n];
  int i;
  for (i = 0; i < PRIME_COUNT && prime[i]*prime[i] <= n; i++)
    if (n % prime[i] == 0)
      return 0;
  return 1;
}

Call fill_primes at the beginning of your program, before starting to process queries. It runs pretty fast.

Sheldon L. Cooper
"The time complexity for each primality test is O(sqrt(n))" -- not nearly! Trial division (the OP's algorithm) is that slow (up to logarithmic factors), but APR or AKS are faster asymptotically.
Charles
@Charles, Huh? I was obviously referring to the OP's algorithm...
Sheldon L. Cooper
Would you edit that in?
Charles
@Charles: OK, clarification added.
Sheldon L. Cooper
A: 

Here's my algorithm, Complexity remains O(n^0.5) but i managed to remove some expensive operations in the code...

The algorithm's slowest part is the modulus operation, i've managed to eliminate sqrt or doing i * i <= n

This way i save precious cycles...its based on the fact that sum of odd numbers is always a perfect square.

Since we are iterating over odd numbers anyway, why not exploit it? :)

int isPrime(int n)
{
    int squares = 1;
    int odd = 3;

    if( ((n & 1) == 0) || (n < 9)) return (n == 2) || ((n > 1) && (n & 1));
    else
    {
        for( ;squares <= n; odd += 2)
        {
            if( n % odd == 0) 
                return 0;
            squares+=odd;
        }
        return 1;
    }
}
st0le