views:

502

answers:

12

I've written an algorithm that I believe to be correct for computing prime numbers up to n with the Sieve of Eratosthenes. Unfortunately, this program hangs on really large values of n (try 10 million). Here is what I've written...

Protected Function Eratosthenes(ByVal n As Integer) As String
    Dim maxValue As Integer = Math.Sqrt(n)
    Dim values As Generic.List(Of Integer) = New Generic.List(Of Integer)
    Dim i As Integer
    ''//create list.
    For i = 2 To n
        values.Add(i)
    Next

    For i = 2 To maxValue
        If values.Contains(i) Then
            Dim k As Integer
            For k = i + 1 To n
                If values.Contains(k) Then
                    If (k Mod i) = 0 Then
                        values.Remove(k)
                    End If
                End If
            Next
        End If
    Next

    Dim result As String = ""
    For i = 0 To values.Count - 1
        result = result & " " & values(i)
    Next

    Return result
End Function

How might I speed this algorithm up? Where are my bottlenecks?

+3  A: 

I am not a VB guy, but I think:

values.Remove(k)

is a waste of time, just set the place to 0, then extract the prime numbers to another list or sort the same list and remove all zeros at once.

AraK
Smart. Thanks for the idea. I was so busy looking for ways to somehow eliminate all of the "for" loops that I didn't even think about the cost of the calculations inside of the loops.
Chris
+6  A: 

Removing elements from a large list is slow.

Why not create an array of Booleans values instead and set a value to "True" when you know that it's non-prime?

When you've found a new prime, you don't need to go through all higher values, just multiple of that one, setting the array element to True.

You can keep a separate list for primes you've found so far, if you want to return them.

Here's a C# implementation which just prints them out as it goes. (In C# if I wanted to return the values I'd return IEnumerable<T> and use an iterator block.)

using System;

public class ShowPrimes
{
    static void Main(string[] args)
    {
        ShowPrimes(10000000);
    }

    static void ShowPrimes(int max)
    {        
        bool[] composite = new bool[max+1];
        int maxFactor = (int) Math.Sqrt(max);

        for (int i=2; i <= maxFactor; i++)
        {
            if (composite[i])
            {
                continue;
            }
            Console.WriteLine("Found {0}", i);
            // This is probably as quick as only
            // multiplying by primes.
            for (int multiples = i * i;
                 multiples <= max;
                 multiples += i)
            {
                composite[multiples] = true;
            }
        }
        // Anything left is a prime, but not
        // worth sieving
        for (int i = maxFactor + 1; i <= max; i++)
        {
            if (composite[i])
            {
                continue;
            }
            Console.WriteLine("Found {0}", i);
        }
    }
}
Jon Skeet
You want to mark composites starting at the square of the prime found, not twice that prime.for (int multiples = i * i; multiples <= max; multiples += i)
hughdbrown
@hughdbrown: Oops, yes of course :)
Jon Skeet
This algorithm is definitely more quick than my original (I can tell with numbers such as One Thousand), however, it is still too slow to process ten million (times out).
Chris
@Chris: It's probably hung up on the console output. On my *netbook* it completes in about a second when the console output is removed.
Jon Skeet
Also, your algorithm tests a lot of numbers you know cannot be prime. It starts at 2 and tries 3, 4, 5, 6, 7, etc. After 2, there are no even prime numbers, so you could have marked 2 as prime and started with 3, incrementing by 2. But it's better than that. Any prime number mod 6 must be either 1 or 5 but not 0, 2, or 4 (even numbers) or 3 (a multiple of 3). So you could mark 2, 3, and 5 as prime and then step up by [2, 4] alternately. That would look at only a third as many numbers so presumably it would be three times faster at finding primes (but not at marking composites).
hughdbrown
@hughdbrown: Yes, I considered that - but left it as it is in order to keep it as simple as possible. Bearing in mind that it can find all the primes under 10 million in less than a second, clearly the "finding the primes" part isn't the bottleneck Chris is running into.
Jon Skeet
+1  A: 

Here is a C# implementation I threw together a while back.

Maybe it'll help!

using System;
using System.Collections.Generic;

namespace sieve050707
{
class MainClass
{
public static void Main(string[] args)
{
Console.WriteLine(”Sieve of Eratosthenes – 05 July 2007″);
int sampleSize = 100;

if (args.Length > 0)
{
sampleSize = Int32.Parse(args[0]);
}

List sampleData = new List();
List primesFound = new List();

for (int counter=2; counter < sampleSize; counter++)
{
sampleData.Add(counter);
}

for (int counter=2; counter < sampleSize; counter++)
{
if(sampleData.Contains(counter))
{
primesFound.Add(counter);
sampleData.Remove(counter);
for (int multiple=counter*2; multiple < sampleSize; multiple = multiple + counter)
{
sampleData.Remove(multiple);
}
}
}
foreach(int prime in primesFound)
{
Console.WriteLine(prime.ToString() + ” is prime”);
}
}
}
}

HTH,

Dan

Daniel Elliott
+1  A: 

use a string builder.

also if its something that you really will use often then you probably want to pass an already known list of primes into it - that way you can reuse computation for existing result sets.

maybe convert the list of int to an array?

are you sure that contains terminates once the item is found?

maybe if you used an ordered prime list you can get a faster search algorithm to test for existing instead of a straight iteration from start to finish (ever back to front when you know its closer to end would be an advantage).

another method would be to multithread the loop so you can use multiple cores using a threadpool or a custom implementation to avoid starting and stopping threads. You would essentially be returning new primes into a pool that the function has a reference to.

John Nicholas
I'm sure this terminates, since it works perfectly with smaller values. I'm doing this as an exercise, and starting with a list of known primes defeats the purpose of this algorithm (this algorithm simply lists primes up to a specific value. What is the purpose of it if I already have a list of known primes?)
Chris
The purpose would be to find larger primes than you already know.
Matthew Scharley
the single largest optomisation that you can do is not to repeat previous work - especially given that you want the 10 millionth prime.
John Nicholas
For sure, I understand that. However, that's not really what I'm looking to do. This is being done as an exercise. Feeding it a file with half of the solution isn't really helping me find a way to improve my algorithm design.
Chris
+2  A: 

Your bottle neck is the use of an List, plain and simple. List.contains is O(n), List.Remove(n).

You'll speed up your application by using a better data structure, namely a BitArray. Assume that items set to True are prime, and those which aren't composite. This means looking up, adding, or removing items from your prime set will be an O(1) operation.

Juliet
Thanks for the heads up! I'm not a vb.net guy, myself. I have to use it at work, however, and this is partially an exercise in familiarizing myself with what's available in the language :)
Chris
A: 

It was proven that it is enough to checks prime only on interval 2..sqrt(maxValue)

Dewfy
That's why I've set MaxValue as Math.Sqrt(n).
Chris
hes talking abotu the sixe of the iterator ... you do not add 1 you add 2 and half the number of vales you have to check (ie do not check even numbers)
John Nicholas
Gotcha, thanks!
Chris
A: 

A big bottleneck in this code is the calling of values.Contains(i) and values.Contains(k) repeatedly on every integer candidate. One simple way to make your current approach faster would be to have a second list alongside the first which stores false/true against each number so that rather than doing a contains, an O(n) algorithm would allow a direct lookup which is O(1). The final loop in this case would then check for true as to whether the value should be included.

There are faster ways to solve the problem however. The traditional Eratosthenes algorithm is to generate numbers from 2 to sqrt(n) and test them against every prime known in a list of primes. If the number divides by any of them perfectly then it is thrown out and the next number is tried, else it is added as a prime in that list. This would likely go faster than the optimisation I suggested above.

Paul Keeble
what he said, my answer is not as clear.
John Nicholas
A: 

Creating an array of all the numbers up to the max number entered is madness; because by definition most of them won't be primes. So you're already holding at least 2x as much data as you need.

There are many well-known methods of looking for primes.

What are you trying to do? Print more than 10 million? How many more? What is the goal here?

If it's just to learn about primes, look into GCD and other fun things. (Chinese Remainder Theorem).

Noon Silk
+1  A: 

Without changing the algorithm, here are some optimizations:

Dim result As String = ""
For i = 0 To values.Count - 1
    result = result & " " & values(i)
Next

This is Schlemiel the painter... Use string.Join instead.


values.Contains(i)
values.Contains(k)

These are expensive - Use HashSet instead of List to make it cheaper.


If values.Contains(k) Then
    If (k Mod i) = 0 Then
        values.Remove(k)
    End If
End If

Mod is way less expensive than Contains (even with HashSet). Do the Mod check first.

David B
Neat suggestions. Thanks
Chris
+1  A: 

If you're using .Net 3.5, you can re-write it like this:

Protected Function Eratosthenes(ByVal n As Integer) As String
    Dim values As List(Of Integer) = Enumerable.Range(0,n).ToList()

    For i = 2 To Math.Sqrt(n)
        If values(i) > 0 Then
            For k As Integer = i + 1 To n
                If values(k) AndAlso (k Mod i) = 0 Then
                    values(k) = 0
                End If
            Next
        End If
    Next

    Return string.Join(" ", values.Where(Function(i) i>1).Select(Function(i) i.ToString()).ToArray())
End Function
Joel Coehoorn
Unfortunately, I'm stuck in 2.0. What portions of this solution are 3.5 dependent?
Chris
Enumerable.Range, the Where(), Select(), ToList(), and ToArray() extension methods, and the lambda expressions are all 3.5 only. The main loop is all 2.0-safe, though.
Joel Coehoorn
Nice, thanks for the tip
Chris
Your inner loop is marking multiples of the prime, i. i was found to be prime when values(i) > 0. So a similar comment applies to your marking phase as I made about Jon Skeet's: you can start at i*i and mark every element you meet as composite, walking up in increments of i, until you reach n. Your code is actually worse than Jon's because you are testing lots of numbers that you *know* cannot be multiples of i. Your inner loop should read: For k As Integer = i*i To n Step i values(k) = 0 Next There is no need to test for non-marking: it hasn't been reached before.
hughdbrown
A: 

I think you are missing the key point of what makes sieving a great algorithmic tool...

The greatness of sieving is that it allows you to avoid doing the costly modulo operation: if you know p is a prime, to eliminate all of its multiples, rather than checking if all numbers in the sieve are divisible by p, you eliminate items 2p, 3p, 4p... Actually, the way the sieve of Erathostenes works, it can be shown that the first item that you will eliminate is p2, so you only need to check if p2, p2+p, p2+2p, p2+3p are there.

With my complete lack of knowledge about Visual Basic, this should take care of your main bottleneck:

Protected Function Eratosthenes(ByVal n As Integer) As String
    Dim maxValue As Integer = Math.Sqrt(n)
    Dim values As Generic.List(Of Integer) = New Generic.List(Of Integer)
    Dim i As Integer
    ''//create list.
    For i = 2 To n
        values.Add(i)
    Next

    For i = 2 To maxValue
        If values.Contains(i) Then
            Dim k As Integer
            For k = i * i To n Step i
                If values.Contains(k) Then
                    values.Remove(k)
                End If
            Next
        End If
    Next

    Dim result As String = ""
    For i = 0 To values.Count - 1
        result = result & " " & values(i)
    Next

    Return result
End Function

You may also want to consider, as has been posted by others, using an array of booleans to hold your sieve, and thus avoid all those Contains checks, which are probably very costly from a computational point of view.

Python is my language of choice, so even though it may not suit your needs, I have blogged about sieving for primes here, although you may also find this and this of interest...

Jaime
A: 

This will do n=10,000,000 in .28 seconds, and n=100,000,000 in about 4 seconds. (VB 2008, Pentium e8500)

The slowest part was the string concatenation. VB is really slow concatenating large strings, so I used an integer array to save the primes.

Instead of doing a mod for each value, you can multiply and you only have to consider a fraction of the values.

VB did not optimize int(i/2) in the "for k" statement, so I moved it to the variable max2.

For large n, redimensioning the answer array became a bottleneck, so I added 100,000 elements each time it needed expanding.

As usual... I didn't check this thoroughly, so there may be bugs.

Protected Function Eratosthenes(ByVal n As Integer) As Integer()

Dim nPrime As Integer
Dim values(n) As Byte
Dim ans(1) As Integer

Dim maxValue As Integer = Math.Sqrt(n)
For i = 0 To n
    values(i) = 1
Next i

Dim max2 As Integer
For i = 2 To maxValue
    If values(i) <> 0 Then
        max2 = Int(n / i)
        For k = 2 To max2
            values(k * i) = 0
        Next k
    End If
Next i

nPrime = 0
For i = 2 To UBound(values)
    If values(i) <> 0 Then
      nPrime = nPrime + 1
      If nPrime > UBound(ans) Then ReDim Preserve ans(nPrime + 100000)
      ans(nPrime) = i
    End If
Next i

ReDim Preserve ans(nPrime)
Eratosthenes = ans

End Function
xpda