tags:

views:

122

answers:

5

I've never really bothered with math programming, but today I've decided to give it a shot.

Here's my code and it's working as intended:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Data;
using System.Windows.Documents;
using System.Windows.Input;
using System.Windows.Media;
using System.Windows.Media.Imaging;
using System.Windows.Navigation;
using System.Windows.Shapes;

namespace PrimeFactorization
{
    /// <summary>
    /// Interaction logic for MainWindow.xaml
    /// </summary>
    public partial class MainWindow : Window
    {
        public MainWindow()
        {
            InitializeComponent();
        }

        private void btnSubmit_Click(object sender, RoutedEventArgs e)
        {
            List<int> primeFactors = FindPrimeFactors(Convert.ToInt32(txtNumber.Text));
            primeFactors.Sort();
            for (int i = 0; i < primeFactors.Count; i++)
            {
                listBoxFoundNumbers.Items.Add(primeFactors[i]);
            }            
        }

        private List<int> FindPrimeFactors(int number)
        {
            List<int> factors = new List<int>();
            factors.Add(1);
            factors.Add(number);

            for (int i = 2; i < number; i++)
            {

                if (number % i == 0)
                {
                    int holder = number / i;
                    //If the number is in the list, don't add it again.
                    if (!factors.Contains(i))
                    {
                        factors.Add(i);
                    }
                    //If the number is in the list, don't add it again.
                    if (!factors.Contains(holder))
                    {
                        factors.Add(holder);
                    }
                }            
            }

            return factors;
        }
    }
}

The only problem I can see with my code is that it will iterate through to the bitter end, even though there will definitely not be any factors.

For example, imagine I wrote in 35. My loop will go up to 35 and check 24,25,26,27...etc. Not very good.

What do you recommend?

+2  A: 

One thing you can do is avoid checking even numbers after 2, since they will never be prime.

So, check 2 and then declare your loop as such:

for (int i = 3; i < number; i+=2)

Re: stopping at sqrt(n) - this is an effective technique for determining whether a given number is prime, since any n that divides x where x > sqrt(n) also divides n/x which is necessarily less than sqrt(n). But a number may have prime factors larger than its own square root (for example, 1002 = 2 * 3 * 167).

That said, you could implement some kind of recursive solution where, for all prime factors p of n such that p < sqrt(n), you also calculate the prime factors of n / p. My gut feeling is that this will decrease the running time of your algorithm in general, but might increase it for small values of n.

Edit: if you're interested in getting into more sophisticated techniques, the Wikipedia page on Integer Factorization links to all kinds of algorithms.

danben
I'm not sure if he's just looking for prime factors. He seems to want all factors of a number.
ntownsend
My plan was to first find all factors, then just prime. This is a great answer. :)
Serg
Really? Everything in the code suggests otherwise. See: `namespace PrimeFactorization`; `List<int> primeFactors`, etc.
danben
@Sergio Tapia: Ah ok, that explains it.
danben
My mistake. I only skimmed over the code and missed that.
ntownsend
You don't need any recursive solution, simple loops are enough.
IVlad
The same can be said of every problem in the world. But "simple" is in the eye of the beholder. Some would prefer to call recursion elegant.
danben
+1  A: 

You can go only up to and including sqrt(number). Consider x from 2 to sqrt(N). if N % x == 0, then N % (N / x) == 0 as well. Take 10 for example:

sqrt(10) = 3 (integer part).

10 % 2 == 0 => 2 is a divisor of 10. 10 % (10 / 2) == 10 % 5 == 0 => 5 is also a divisor.

10 % 3 != 0.

This is it, the divisors of 10 are 2 and 5. You have to be careful when dealing with perfect squares and that is it.

You also don't need your checks to see if a number is in the list or not if you do it like this. Just make sure you don't add the same number twice in case of perfect squares (if x == N / x, only add x).

This is about as efficient as it gets without considering a lot more complicated algorithms.

Edit: To get PRIME factors only, when you find an x such that N % x == 0, divide N by x while this is possible. For example, consider 20:

sqrt(20) = 4.

20 % 2 == 0 => 2 is a prime factor. Divide it by 2 until the remainder of the division by 2 is no longer 0: 20 / 2 = 10, 10 / 2 = 5.

5 % 3 != 0

5 % 4 != 0

The number you are left with at the end of the algorithm is 5 > 1, so add 5 as well. The prime factors of 20 are therefore 2 and 5.

So basically your algorithm in pseudocode is this:

listPrimes(number)
  N = number;
  for ( i = 2; i <= (int)sqrt(number); ++i )
    if ( N % i == 0 )
      primeFactors.Add(i);
      while ( N % i == 0 )
        N /= i;

  if ( N > 1 )
    primeFactors.Add(N);

You can also use what @danben has suggested in his post, using increments of 2 and starting from 3 and treating 2 separately. The most efficient however is to generate primes using the sieve of Eratosthenes and using those generated primes in combination with this algorithm. Most efficient while staying in the world of basic math anyway.

IVlad
That works nicely for a small input like 10 where pretty much everything smaller than it is prime, but for 1000 your algorithm would add 500 as a prime factor.
danben
Oh, sorry, I missed the prime factors part. This can still work though, I'll edit in a few seconds.
IVlad
I think you might be approaching what I have already written up so please read that first, but if you have something new then by all means post it.
danben
A: 

If you want to find all factors (not just prime), once you find a smallest factor, k, you can update your loop's upper bound to only go up to (and including) n/k. This is because the largest factor l must satisfy p = k * l.

An even better approach may be to use the factors that you've found so far to calculate the factors greater than sqrt(n). So, in the above, you could calculate l directly from k and n.

ntownsend
`n/k` will have a lower bound of `sqrt(n)`, so it is more efficient to always stop at `sqrt(n)` (as noted in other answers).
danben
I was updating my answer to include that fact when you left your comment :)
ntownsend
+1  A: 

One classic algorithm for finding prime factors is the Sieve of Eratosthenes (see this)

Dan Bryant
+1  A: 

First order of business is to use a proper prime-generating algorithm. Most common is the Sieve of Eratosthenes, which I'll post because it's easier, but there's also the Sieve of Atkin which is far more optimized.

I'd use something like this for the sieve, which I've posted before:

public class Primes
{
    public static IEnumerable<int> To(int maxValue)
    {
        if (maxValue < 2)
            return Enumerable.Empty<int>();

        bool[] primes = new bool[maxValue + 1];
        for (int i = 2; i <= maxValue; i++)
            primes[i] = true;

        for (int i = 2; i < Math.Sqrt(maxValue + 1) + 1; i++)
        {
            if (primes[i])
            {
                for (int j = i * i; j <= maxValue; j += i)
                    primes[j] = false;
            }
        }

        return Enumerable.Range(2, maxValue - 1).Where(i => primes[i]);
    }
}

The last step is just to find out which primes are factors, which is easy:

public static IEnumerable<int> GetPrimeFactors(int num)
{
    return Primes.To(num).Where(i => num % i == 0);
}

That's it! This will generate prime factors up to 20 million in less than 1 second, so it's probably good enough for whatever you're doing. Otherwise you can use a better sieve, as I mentioned at the top, or a lazy skip-set algorithm.

One last thing: If you're trying to factor huge numbers then you need to switch to something very different, such as the General Number Field Sieve. This is (as far as I know) currently the fastest factorization algorithm for very large numbers, as part of ongoing research to see if RSA encryption can be cracked.

Aaronaught
You can optimize your sieve further by starting `j` from `i * i` not `2 * i` and ignoring multiples of 2. And isn't this checking all the primes up to `num`?
IVlad
@IVlad: Good point about starting from `i*i` (although in practice the improvement is very marginal, about 1% of the running time). The "multiples of 2" optimization is silly because the sieve already eliminates those on the first step. Why not also ignore multiples of 3, and 5, and 7? These don't actually decrease the order of the algorithm, and not attempting to make such optimizations was very intentional.
Aaronaught
@Aaronaught: no, it isn't silly. Your `i` variable gets incremented by `1` at each step. You can start it from `3` and increment it by `2` at each step, resulting in half the number of incrementations of `i`. A prime number different than `2` is never even, but an odd number might be, so ignoring multiples of `2` is the easiest. You can also use half the memory by ignoring indexes of even numbers in your `primes` array. so `primes[i] = true` if `2*i-1` is prime. The order of the algorithm remains unchanged, but these optimizations can matter a lot in practice. Also: `i * i <= sqrt(maxValue)`.
IVlad
I will definitely look into your solution because if I'm not mistaken you're using Lambdas correct? That's something I've always wanted to look into. Thank you!
Serg
@IVlad: The "optimizations" you're talking about are **exactly what the sieve does**! Incrementing the counter isn't the bottleneck here, crossing out multiples is, and if the index is already known to be composite then it will just get skipped. "Optimizing" for the case of 2 accomplishes precisely nothing. Profile it and see. It's not only silly, it's harmful, it needlessly complicates the algorithm for no tangible benefit. I actually have a profiler right in front of me and can tell you that it barely has any impact on the running time. Please don't make unsupported claims like this.
Aaronaught
@Sergio: The last line of the prime-gen uses lambdas, yes, as is the body of `GetPrimeFactors` - it's not exactly highly-sophisticated functional programming but I suppose it's a good intro to the topic. :)
Aaronaught
One other thing, there actually **is** an optimization that has a measurable impact here, and that's inverting the boolean logic, because then the tail of the array doesn't have to be initialized to `true`, only the head (first two elements). It's about a 20% improvement to the running time - still not worth obfuscating the logic over IMO because the asymptotic time (big-O) hasn't changed. If you need something faster then you need to use a faster algorithm with smaller O.
Aaronaught
@Aaronaught: incrementing `i` by 2 instead of 1 is saving one check for compositeness for every even number, meaning half the numbers. What is the point of letting it run the check when it is so easy to skip it? I have profiled it: 100 runs of `GeneratePrimes(10000000)` as you posted it takes 24.4 seconds and 22.8 seconds by starting `i` from 3 and incrementing it by 2. This is on a high end system, I think the difference is bigger on slower systems. Removing the initialization of the `Primes` array takes it down to 19.0 seconds. On slower systems, you'll want to do both. At least I would.
IVlad
@IVlad: This argument has been hashed out a gazillion times already. O(0.9N) is still O(N). Again, what's special about multiples of 2 - why not do the same for 3, 5, 7, 11? Worrying about such micro-optimizations is pointless; chances are that either the SoE/SoA is already fast enough, or you need a completely different and much more complex *algorithm* like GNFS or skip sets.
Aaronaught
@Aaronaught: the special thing about 2 is that it is the most convenient to implement as a special case. How would you treat the multiples of 2 AND 3 as special cases? You'd need to complicate things a lot more, because you can't go from 6 to 6 or 3 to 3 that easily without missing primes. It's a minor optimization, but in practice you cannot ignore constant factors and all this really requires is changing ++ to +=. You say the complexity itself matters more but that is not always the case. The TSP is always exponential, but there are deterministic algorithms that work for hundreds of cities.
IVlad