views:

165

answers:

6

I was trying to create this helper function in C# that returns the first n prime numbers. I decided to store the numbers in a dictionary in the <int,bool> format. The key is the number in question and the bool represents whether the int is a prime or not. There are a ton of resources out there calculating/generating the prime numbers(SO included), so I thought of joining the masses by crafting another trivial prime number generator.

My logic goes as follows:

public static Dictionary<int,bool> GetAllPrimes(int number)
    {
        Dictionary<int, bool> numberArray = new Dictionary<int, bool>();


        int current = 2;
        while (current <= number)
        {
            //If current has not been marked as prime in previous iterations,mark it as prime
            if (!numberArray.ContainsKey(current))
                numberArray.Add(current, true);

            int i = 2;
            while (current * i <= number)
            {
                if (!numberArray.ContainsKey(current * i))
                    numberArray.Add(current * i, false);
                else if (numberArray[current * i])//current*i cannot be a prime
                    numberArray[current * i] = false;
                i++;

            }
            current++;
        }
        return numberArray;
    }

It will be great if the wise provide me with suggestions,optimizations, with possible refactorings. I was also wondering if the inclusion of the Dictionary helps with the run-time of this snippet.

+2  A: 

The first thing that bothers is that, why are you storing the number itself ?

Can't you just use the index itself which will represent the number?

PS: I'm not a c# developer so maybe it is not possible with a dictionary, but it can be done with the appropriate structure.

Soufiane Hassou
@Soufiane Hassou - Do you have an example(in the language of your choice) where the numbers are being indexed?
sc_ray
I believe he means you can replace the numbers *with* the indexes - an array of Booleans would suffice in this case (`arr[0] = false, arr[1] = false, arr[2] = true, arr[3] = true, arr[4] = false, etc...`)
djacobson
That means allocating a data structure with a predefined size. A array of Booleans may impose pre-allocating and initializing the array before using it.
sc_ray
What about `List<bool>`? It can use indexes like an array (in fact, I'm pretty sure it uses an array internally), but automatically takes care of resizing and allocating more space, if needed.
Heinzi
@Heinzi - That sounds good. Using List<bool>. Thanks
sc_ray
+1  A: 

I'm pretty sure the Dictionary actually hurts performance, since it doesn't enable you to perform the trial divisions in an optimal order. Traditionally, you would store the known primes so that they could be iterated from smallest to largest, since smaller primes are factors of more composite numbers than larger primes. Additionally, you never need to try division with any prime larger than the square root of the candidate prime.

Many other optimizations are possible (as you yourself point out, this problem has been studied to death) but those are the ones that I can see off the top of my head.

Daniel Pryden
@Daniel - Thanks. If order is not of importance for some weird reason, how else does the Dictionary hurt performance?
sc_ray
+2  A: 

First, you only have to loop untill the square root of the number. Make all numbers false by default and have a simple flag that you set true at the beginning of every iteration.

Further, don't store it in a dictionary. Make it a bool array and have the index be the number you're looking for. Only 0 won't make any sense, but that doesn't matter. You don't have to init either; bools are false by default. Just declare an bool[] of number length.

Then, I would init like this:

primes[2] = true;
for(int i = 3; i < sqrtNumber; i += 2) {

}

So you skip all the even numbers automatically.

By the way, never declare a variable (i) in a loop, it makes it slower.

So that's about it. For more info see this page.

Jouke van der Maas
`never declare a variable (i) in a loop` - What?
Tim Robinson
The variable i is declared inside a while loop in his code. He should declare it outside of the loop and assign it inside.
Jouke van der Maas
+1  A: 

1) From the perspective of the client to this function, wouldn't it be better if the return type was bool[] (from 0 to number perhaps)? Internally, you have three states (KnownPrime, KnownComposite, Unknown), which could be represented by an enumeration. Storing an an array of this enumeration internally, prepopulated with Unknown, will be faster than a dictionary.

2) If you stick with the dictionary, the part of the sieve that marks multiples of the current number as composite could be replaced with a numberArray.TryGetValue() pattern rather than multiple checks for ContainsKey and subsequent retrieval of the value by key.

Ani
@Ani - numberArray.TryGetValue() pattern sounds like a good idea. Do you have any examples of it?
sc_ray
Actually, in this case, it will be faster to simply do:numberArray[current * i] = false; which does not throw even if the key already exists.FYI, the pattern is:bool isMarkedPrime;if(numberArray.TryGetValue(current * i, out isMarkedPrime) || isMarkedPrime) numberArray[current * i] = false;
Ani
That should have been if(!numberArray.TryGetValue(current * i, out isMarkedPrime) || isMarkedPrime) numberArray[current * i] = false;
Ani
A: 

The dictionary really doesn't make sense here -- just store all primes up to a given number in a list. Then follow these steps:

Is given number in the list?  
    Yes - it's prime.  Done.
Not in list
Is given number larger than the list maximum?
    No - it's not prime.  Done.
Bigger than maximum; need to fill list up to maximum.
Run a sieve up to given number.
Repeat.
Austin Salonen
+1  A: 

Storing integers explicitly needs at least 32 bits per prime number, with some overhead for the container structure.

At around 231, the maximal value a signed 32 bit integer can take, about every 21.5th number is prime. Smaller primes are more dense, about 1 in ln(n) numbers is prime around n.

This means it is more memory efficient to use an array of bits than to store numbers explicitly. It will also be much faster to look up if a number is prime, and reasonably fast to iterate through the primes.

It seems this is called a BitArray in C# (in Java it is BitSet).

starblue
I really like the idea of using the BitArray and I also checked out your implementation using Java. I understand the merits of indexing, but the index itself is an integer. Can you shed some light on how indexing works in data structures such as BitArray that lowers the overhead but at the same time promotes faster lookup?
sc_ray