tags:

views:

199

answers:

4

Hi

From Uncle Bob's book Clean Code (example was in Java, so this was my first java translation), I've been playing with the refactoring example on prime numbers.

The question is: How would you refactor the following code?

There are 4 versions here: UglyVersion :-) , BobVersion, PeteVersion, PeteVersionClassMember. I'm doing this for fun, and hope you enjoy too :-)

class ProgramBad
{
    static void Main()
    {
        int[] result = GeneratePrimes.generatePrimes(30);
        foreach (var i in result)
            Console.Write(i.ToString() + ", ");
    }
}

/// <summary>
/// Given an array of integers starting at 2, cross out the multiples of 2.  Fine the next
/// uncrossed integer, and cross out all of its multiples.
/// Repeat until you have passed the square root of the maximum value
/// </summary>
public class GeneratePrimes
{
    public static int[] generatePrimes(int maxValue)
    {
        if (maxValue >= 2) // the only valid case
        {
            // declarations
            int s = maxValue + 1; // size of the array
            bool[] f = new bool[s];
            int i;

            // initialize array to be true
            for (i = 0; i < s; i++)
            {
                f[i] = true;
            }

            // get rid of non-primes
            f[0] = f[1] = false;

            //sieve
            int j;
            for (i = 2; i < Math.Sqrt(s) + 1; i++)
            {
                if (f[i]) // if i is uncrossed, cross its multiples
                {
                    for (j = 2 * i; j < s; j += i)
                        f[j] = false; // multiple is not a prime
                }
            }

            // how many primes are there?
            int count = 0;
            for (i = 0; i < s; i++)
            {
                if (f[i])
                    count++; // bump count
            }

            int[] primes = new int[count];

            //move the primes into the result
            for (i = 0, j=0;i<s ; i++)
            {
                if (f[i])
                    primes[j++] = i;
            }

            return primes; // return the primes
        } else // maxvalue < 2
            return new int[0]; // return null array if bad input
    }
}

Bob's refactored version:

class ProgramBob
    {
        static void Main()
        {
            int[] result = PrimeGeneratorBob.generatePrimes(30);
            foreach (var i in result)
                Console.Write(i + ", ");
        }
    }

    /// <summary>
    /// Generates prime number up to a user specified maximum
    /// The algorithm used is the Sieve of Eratosthenes.
    /// Given an array of integers starting at 2:
    /// Find the first uncrossed (eg 3 ) integer, and cross out all its multiples (eg 6,9,12,14)
    /// Repeat until there are no more multipes in the array
    /// </summary>
    class PrimeGeneratorBob
    {
        static bool[] crossedOut;
        static int[] result;

        static public int[] generatePrimes(int maxValue)
        {
            if (maxValue < 2)
                return new int[0];
            else
            {
                uncrossIntegersUpTo(maxValue);
                crossOutMultiples();
                putUncrossedIntegersIntoResult();
                return result;
            }
        }

        static void uncrossIntegersUpTo(int maxValue)
        {
            crossedOut = new bool[maxValue + 1]; // as bool array starts at 0, and if maxvalue = 10, we need an array of length 11
            for (int i = 2; i < crossedOut.Length; i++) // less than as we don't want to reference crossedOut[4] which doesn't exist
                crossedOut[i] = false;
        }

        static void crossOutMultiples()
        {
            int limit = determineIterationLimit();
            for (int i = 2; i <= limit; i++)
                if (notCrossed(i))
                    crossOutMultiplesOf(i);
        }

        static int determineIterationLimit()
        {
            // Every multiple in the array has a prime factor that
            // is less than or equal to the square root of the array size, 
            // which is the largest number we are trying to find primes in.
            // So we don't have to cross out numbers
            // larger than that square root of the maxnumber, as they will have been crossed out already
            double iterationLimit = Math.Sqrt(crossedOut.Length);
            return (int) iterationLimit;
        }

        static void crossOutMultiplesOf(int i)
        {
            for (int multiple = 2*i; multiple < crossedOut.Length; multiple += i)
                crossedOut[multiple] = true;
        }

        static bool notCrossed(int i)
        {
            return crossedOut[i] == false;
        }

        static void putUncrossedIntegersIntoResult()
        {
            result = new int[numberOfUncrossedIntegers()];
            for (int j = 0, i = 2; i < crossedOut.Length; i++)
                if (notCrossed(i))
                    result[j++] = i;
        }

        static int numberOfUncrossedIntegers()
        {
            int count = 0;
            for (int i = 2; i < crossedOut.Length; i++)
                if (notCrossed(i))
                    count++;
            return count;
        }
    }

What I like about this is:

  • How easy it is to get an overall feeling of how the program works eg uncrossIntegersUpTo(maxValue); crossOutMultiples(); putUncrossedIntegersIntoResult();

I paired with a friend over the weekend, and we came up with this version:

class PetesRefactored
{
    static void MainPete()
    {
        PrimeGeneratorPete pg = new PrimeGeneratorPete();
        int[] arrayOfPrimes = pg.generatePrimes(30);
        foreach (int prime in arrayOfPrimes)
            Console.Write(prime + ", ");
    }
}

/// <summary>
/// Generates prime number up to a user specified maximum
/// The algorithm used is the Sieve of Eratosthenes.
/// Given an array of integers starting at 2:
/// Find the first uncrossed (eg 3 ) integer, and cross out all its multiples (eg 6,9,12,14)
/// Repeat until there are no more multipes in the array
/// </summary>
class PrimeGeneratorPete
{
    public int[] generatePrimes(int maxValue)
    {
        bool[] crossedOut;

        if (maxValue < 2)
            return new int[0];
        else
        {
            crossedOut = new bool[maxValue + 1];
            uncrossIntegersUpTo(crossedOut);
            crossOutMultiples(crossedOut);
            return putUncrossedIntegersIntoResult(crossedOut);
        }
    }

    void uncrossIntegersUpTo(bool[] crossedOut)
    {
        for (int i = 2; i < crossedOut.Length; i++) 
            crossedOut[i] = false;
    }

    void crossOutMultiples(bool[] crossedOut)
    {
        int limit = determineIterationLimit(crossedOut);
        for (int i = 2; i <= limit; i++)
            if (!crossedOut[i])
                crossOutMultiplesOf(crossedOut, i);
    }

    int determineIterationLimit(bool[] crossedOut)
    {
        // Every multiple in the array has a prime factor that
        // is less than or equal to the square root of the array size, 
        // which is the largest number we are trying to find primes in.
        // So we don't have to cross out numbers
        // larger than that square root of the maxnumber, as they will have been crossed out already
        double iterationLimit = Math.Sqrt(crossedOut.Length);
        return (int) iterationLimit;
    }

    void crossOutMultiplesOf(bool[] crossedOut, int i)
    {
        for (int multiple = 2*i; multiple < crossedOut.Length; multiple += i)
            crossedOut[multiple] = true;
    }

    int[] putUncrossedIntegersIntoResult(bool[] crossedOut)
    {
        int[] result = new int[numberOfUncrossedIntegers(crossedOut)];
        for (int j = 0, i = 2; i < crossedOut.Length; i++)
            if (!crossedOut[i])
                result[j++] = i;
        return result;
    }

    int numberOfUncrossedIntegers(bool[] crossedOut)
    {
        int count = 0;
        for (int i = 2; i < crossedOut.Length; i++)
            if (!crossedOut[i])
                count++;
        return count;
    }
}

We:

  • Initialised crossedOut in generatePrimes method instead of ‘child’ method
  • Pass in crossedOut as a parameter instead of a class scope variable
  • Took out (defactored), the notCrossed(i) method as !crossedOut[i] is very readable inline.
  • Make everything non static

And thanks to Casey and anon.

Here is the code with crossedOut as a class level variable. I like this as it cuts down on some noise when passing parameters into methods.

class PrimeGeneratorPeteClassMember
{
    bool[] crossedOut;

    public int[] generatePrimes(int maxValue)
    {
        if (maxValue < 2)
            return new int[0];
        else
        {
            crossedOut = new bool[maxValue + 1];
            uncrossIntegersUpTo();
            crossOutMultiples();
            return putUncrossedIntegersIntoResult();
        }
    }

    void uncrossIntegersUpTo()
    {
        for (int i = 2; i < crossedOut.Length; i++)
            crossedOut[i] = false;
    }

    void crossOutMultiples()
    {
        int limit = determineIterationLimit();
        for (int i = 2; i <= limit; i++)
            if (!crossedOut[i])
                crossOutMultiplesOf(i);
    }

    int determineIterationLimit()
    {
        double iterationLimit = Math.Sqrt(crossedOut.Length);
        return (int)iterationLimit;
    }

    void crossOutMultiplesOf(int i)
    {
        for (int multiple = 2 * i; multiple < crossedOut.Length; multiple += i)
            crossedOut[multiple] = true;
    }

    int[] putUncrossedIntegersIntoResult()
    {
        int[] result = new int[numberOfUncrossedIntegers()];
        for (int j = 0, i = 2; i < crossedOut.Length; i++)
            if (!crossedOut[i])
                result[j++] = i;
        return result;
    }

    int numberOfUncrossedIntegers()
    {
        int count = 0;
        for (int i = 2; i < crossedOut.Length; i++)
            if (!crossedOut[i])
                count++;
        return count;
    }
}
A: 

The only thing that I would have done different would have been in your final version to leave the crossedOut boolean array as a class member. The rest of the methods are all private so there would not be much to gain by passing the array in and out of each of the methods.

Casey
Leaving it as a class member also allows it to be cached, rather than regenerated each time.
Anon.
A: 

Just to clarify Maxwells inverted IF comment:

This is bad:

if (maxValue >= 2) 
    { blah }
else
    return new int[];

This is good

if (maxValue < 2) 
    return new int[0];
else { blah }
Dave
Why is one better than the other?
Robert Harvey
The braces, which could encapsulate many lines. It's much more obvious which if statement is paired up with that else when there's only one line between them.
Wallacoloo
It's also kind of nice to have the return near the top. Then if you're tracing through the code, you can instantly say "Alright, if it's less than 2, we just return an empty array" without reading any further. Of course, that goes against the rule of "There should only be one return statement and it should be at the end of the function." I happen to dislike that rule.
MatrixFrog
@MatrixFrog: this is usually called a *guard* and is indeed a very nice way of clarifying control flow. That "single return" rule is just stupid.
Jörg W Mittag
I also really don't like the single return rule. I've followed that in the past, and it usually resulted in deep nesting, and even refactoring it out, wasn't at all straightforward.
Casey
+2  A: 

Honestly? I wouldn't have refactored at all. Maybe it's just because I used to be a math wonk but I find the first version much easier to read.

It often does not make much sense to refactor an algorithm. When you refactor code it means you expect to reuse or change parts of it. In this case the entire block of code is static and immutable - the only thing you might change is to swap out the entire function with a different algorithm. One-line functions like notCrossed seem particularly useless; they merely serve to make the code more verbose without helping to explain anything that isn't already obvious.

Actually, maybe there are two refactorings I would do:

  • Change the class name GeneratePrimes into PrimeGenerator, as you already did. Verb class names always throw me for a loop - is it a class, or a method?

  • Change it to either return an IList<int> or IEnumerable<int>. Returning array types is Considered Harmful.

  • Edit - one more third refactoring would be to remove some of the useless comments and use meaningful variable names instead (where appropriate).

Other than that - stick with the original version!


Edit - actually, the more I look at the original, the more I dislike it. Not because of the way it's organized, just the way it's written. Here's my rewrite:

public class PrimeGenerator
{
    public static IEnumerable<int> GeneratePrimes(int maxValue)
    {
        if (maxValue < 2)
            return Enumerable.Empty<int>();

        bool[] primes = new bool[maxValue + 1];
        for (int i = 2; i <= maxValue; i++)
            primes[i] = true;

        for (int i = 2; i < Math.Sqrt(maxValue + 1) + 1; i++)
        {
            if (primes[i])
            {
                for (int j = 2 * i; j <= maxValue; j += i)
                    primes[j] = false;
            }
        }

        return Enumerable.Range(2, maxValue - 1).Where(i => primes[i]);
    }
}

There, much better!

Aaronaught
I don't agree that you refactor to reuse parts. you refactor for many reasons, one of which might be readability. Extracting methods which describe what they do in their name can make the algorithm much more understandable, as you can focus on the intent of the person who wrote it and not on the implementation of that intent. Its much easier to understand the GeneratesPrimes intent of how the algorithm is intended to work in the PrimeGeneratorPete method than in yours above, and each method would probably be easier to test to see that it does its step correctly.
Sam Holder
@bebop: You don't *need* to test each individual step. It's a simple algorithm. The entire rewritten method is 10 non-whitespace lines of code. Are programmers today so far-gone that they can't read and understand a simple program without a zillion comments and unit tests? Don't get me wrong, if I see a 100-line method I try to refactor, but Pete's version is 3 times as long as this, and therefore 3 times more error-prone and difficult to maintain (not that you would ever need to "maintain" such a thing).
Aaronaught
Nice! I would call the class `Primes` and the method `Upto` for a warm and fuzzy DSL-ish "feel": `Primes.Upto(10);`
Jörg W Mittag
BTW: I just realized that there is no way to get an index counter in query comprehension syntax, which is why your LINQ query cannot expressed in it. Something like `from p counting i in primes select p ? i : 0 where i > 0`.
Jörg W Mittag
@Jörg: Indeed, that's the reason for the lambda kludge, which would have made more sense as a `Where(p => p)` followed by `Select`, but of course the indexes are wrong. If you had a `Sequence` method somewhere that would simply return numbers from 1 to *N*, then you could write it as `from i in Sequence.To(100) where primes[i] select i`. But I'd have to write that extra class and I wanted to keep the solution short. ;)
Aaronaught
I'd replace the array initializer with some LINQ, something like `Enumerable.Range(1, maxValue).Select(i => true).ToArray()`. I might even flip the logic to nonprimes and leave out that step entirely.
Isaac Cambron
Yeah, I'd forgotten about `Enumerable.Range`, I knew there was a method like that somewhere but couldn't place it. Updated to use that at the end. Trouble with using it at the beginning is that elements 0 and 1 have to be set to `false`, which just adds more LOC anyway (or complicates the rest of the logic if the array is made 2 elements smaller).
Aaronaught
Right, I missed that. Take 2: `Enumerable.Range(0, maxValue + 1).Select(i => i > 2).ToArray();`
Isaac Cambron
@Isaac: Neat, but getting a little out there in the readability department, and takes twice as long to initialize. ;)
Aaronaught
Interesting! I like the way the code is now readable on a few lines. Time to investigate IEnumerable, Linq and Lamdas... more coffee required.. Great conversations, great fun, many thanks.
Dave
A: 

Elegant solution using LINQ:

static IEnumerable<int> PrimeNumbers(int maxValue)
{
    if (maxValue > 1 && maxValue < int.MaxValue)
    {
        var integers = Enumerable.Range(2, maxValue);
        for (;;)
        {
            int item = integers.FirstOrDefault();
            if (item == 0)
            {
                break;
            }

            yield return item;
            integers = integers.Where(x => x % item != 0);
        }
    }
}
Oleg I.
Elegant, but not particularly efficient; thinking of all those `Where` delegates piling up on the stack makes my head spin...
Aaronaught
In space, maybe. I measured time though, and it's only slightly less efficient than yours (7.1 seconds vs 6.8 on maxValue= ten million). The original, interesting, is faster (3.8 seconds). Didn't bother to profile to see why though.
Isaac Cambron
@Isaac: How did you profile?? On my machine, mine runs in less than 1 ms, this one hasn't finished after a full minute. It's brutal. Did you remember that enumerables based on `yield return` have deferred execution and that you need to iterate through the entire set of results when profiling?
Aaronaught
On a much less demanding run with max = 5000, mine finishes in 0.005 ms and this one takes 8.6 seconds. The running time on this one grows exponentially with the max value because it's a Schlemiel-the-Painter algorithm, basically.
Aaronaught
@all: I agree it is not quite efficient on large numbers, because after yielding each item we apply additional condition filter to it. Even without profiler you can run it in console and see how it gradually slows. BUT a) it requires less code and b) it provides deferred execution, if your goal is not to iterate over whole integer range
Oleg I.
@Aaronaught, yes on differed execution. You also have to make sure that you do something with it, because the compiler (I think) can optimize away code whose variables you never use. I computed the sum of the primes and timed that.
Isaac Cambron