ansaurus

Question

Answer 1

A:

The only thing that I would have done different would have been in your final version to leave the crossedOut boolean array as a class member. The rest of the methods are all private so there would not be much to gain by passing the array in and out of each of the methods.

Casey 2010-02-08 01:11:53

Leaving it as a class member also allows it to be cached, rather than regenerated each time.

Anon. 2010-02-08 01:33:01

Answer 2

A:

Just to clarify Maxwells inverted IF comment:

This is bad:

if (maxValue >= 2) 
    { blah }
else
    return new int[];

This is good

if (maxValue < 2) 
    return new int[0];
else { blah }

Dave 2010-02-08 03:34:56

Why is one better than the other?

Robert Harvey 2010-02-08 03:47:14

The braces, which could encapsulate many lines. It's much more obvious which if statement is paired up with that else when there's only one line between them.

Wallacoloo 2010-02-08 04:02:57

It's also kind of nice to have the return near the top. Then if you're tracing through the code, you can instantly say "Alright, if it's less than 2, we just return an empty array" without reading any further. Of course, that goes against the rule of "There should only be one return statement and it should be at the end of the function." I happen to dislike that rule.

MatrixFrog 2010-02-08 04:27:25

@MatrixFrog: this is usually called a *guard* and is indeed a very nice way of clarifying control flow. That "single return" rule is just stupid.

Jörg W Mittag 2010-02-08 14:52:54

I also really don't like the single return rule. I've followed that in the past, and it usually resulted in deep nesting, and even refactoring it out, wasn't at all straightforward.

Casey 2010-02-08 17:30:54

Answer 3

+2 A:

Honestly? I wouldn't have refactored at all. Maybe it's just because I used to be a math wonk but I find the first version much easier to read.

It often does not make much sense to refactor an algorithm. When you refactor code it means you expect to reuse or change parts of it. In this case the entire block of code is static and immutable - the only thing you might change is to swap out the entire function with a different algorithm. One-line functions like notCrossed seem particularly useless; they merely serve to make the code more verbose without helping to explain anything that isn't already obvious.

Actually, maybe there are two refactorings I would do:

Change the class name GeneratePrimes into PrimeGenerator, as you already did. Verb class names always throw me for a loop - is it a class, or a method?
Change it to either return an IList<int> or IEnumerable<int>. Returning array types is Considered Harmful.
Edit - one more third refactoring would be to remove some of the useless comments and use meaningful variable names instead (where appropriate).

Other than that - stick with the original version!

Edit - actually, the more I look at the original, the more I dislike it. Not because of the way it's organized, just the way it's written. Here's my rewrite:

public class PrimeGenerator
{
    public static IEnumerable<int> GeneratePrimes(int maxValue)
    {
        if (maxValue < 2)
            return Enumerable.Empty<int>();

        bool[] primes = new bool[maxValue + 1];
        for (int i = 2; i <= maxValue; i++)
            primes[i] = true;

        for (int i = 2; i < Math.Sqrt(maxValue + 1) + 1; i++)
        {
            if (primes[i])
            {
                for (int j = 2 * i; j <= maxValue; j += i)
                    primes[j] = false;
            }
        }

        return Enumerable.Range(2, maxValue - 1).Where(i => primes[i]);
    }
}

There, much better!

Aaronaught 2010-02-08 04:04:24

I don't agree that you refactor to reuse parts. you refactor for many reasons, one of which might be readability. Extracting methods which describe what they do in their name can make the algorithm much more understandable, as you can focus on the intent of the person who wrote it and not on the implementation of that intent. Its much easier to understand the GeneratesPrimes intent of how the algorithm is intended to work in the PrimeGeneratorPete method than in yours above, and each method would probably be easier to test to see that it does its step correctly.

Sam Holder 2010-02-08 13:00:29

@bebop: You don't *need* to test each individual step. It's a simple algorithm. The entire rewritten method is 10 non-whitespace lines of code. Are programmers today so far-gone that they can't read and understand a simple program without a zillion comments and unit tests? Don't get me wrong, if I see a 100-line method I try to refactor, but Pete's version is 3 times as long as this, and therefore 3 times more error-prone and difficult to maintain (not that you would ever need to "maintain" such a thing).

Aaronaught 2010-02-08 14:10:57

Nice! I would call the class `Primes` and the method `Upto` for a warm and fuzzy DSL-ish "feel": `Primes.Upto(10);`

Jörg W Mittag 2010-02-08 14:47:21

BTW: I just realized that there is no way to get an index counter in query comprehension syntax, which is why your LINQ query cannot expressed in it. Something like `from p counting i in primes select p ? i : 0 where i > 0`.

Jörg W Mittag 2010-02-08 14:51:14

@Jörg: Indeed, that's the reason for the lambda kludge, which would have made more sense as a `Where(p => p)` followed by `Select`, but of course the indexes are wrong. If you had a `Sequence` method somewhere that would simply return numbers from 1 to *N*, then you could write it as `from i in Sequence.To(100) where primes[i] select i`. But I'd have to write that extra class and I wanted to keep the solution short. ;)

Aaronaught 2010-02-08 15:22:11

I'd replace the array initializer with some LINQ, something like `Enumerable.Range(1, maxValue).Select(i => true).ToArray()`. I might even flip the logic to nonprimes and leave out that step entirely.

Isaac Cambron 2010-02-09 06:12:11

Yeah, I'd forgotten about `Enumerable.Range`, I knew there was a method like that somewhere but couldn't place it. Updated to use that at the end. Trouble with using it at the beginning is that elements 0 and 1 have to be set to `false`, which just adds more LOC anyway (or complicates the rest of the logic if the array is made 2 elements smaller).

Aaronaught 2010-02-09 06:16:26

Right, I missed that. Take 2: `Enumerable.Range(0, maxValue + 1).Select(i => i > 2).ToArray();`

Isaac Cambron 2010-02-09 06:27:15

@Isaac: Neat, but getting a little out there in the readability department, and takes twice as long to initialize. ;)

Aaronaught 2010-02-09 06:32:25

Interesting! I like the way the code is now readable on a few lines. Time to investigate IEnumerable, Linq and Lamdas... more coffee required.. Great conversations, great fun, many thanks.

Dave 2010-02-11 01:30:52

Answer 4

A:

Elegant solution using LINQ:

static IEnumerable<int> PrimeNumbers(int maxValue)
{
    if (maxValue > 1 && maxValue < int.MaxValue)
    {
        var integers = Enumerable.Range(2, maxValue);
        for (;;)
        {
            int item = integers.FirstOrDefault();
            if (item == 0)
            {
                break;
            }

            yield return item;
            integers = integers.Where(x => x % item != 0);
        }
    }
}

Oleg I. 2010-02-08 18:49:04

Elegant, but not particularly efficient; thinking of all those `Where` delegates piling up on the stack makes my head spin...

Aaronaught 2010-02-09 00:50:27

In space, maybe. I measured time though, and it's only slightly less efficient than yours (7.1 seconds vs 6.8 on maxValue= ten million). The original, interesting, is faster (3.8 seconds). Didn't bother to profile to see why though.

Isaac Cambron 2010-02-09 06:10:08

@Isaac: How did you profile?? On my machine, mine runs in less than 1 ms, this one hasn't finished after a full minute. It's brutal. Did you remember that enumerables based on `yield return` have deferred execution and that you need to iterate through the entire set of results when profiling?

Aaronaught 2010-02-09 06:23:13

On a much less demanding run with max = 5000, mine finishes in 0.005 ms and this one takes 8.6 seconds. The running time on this one grows exponentially with the max value because it's a Schlemiel-the-Painter algorithm, basically.

Aaronaught 2010-02-09 06:29:12

@all: I agree it is not quite efficient on large numbers, because after yielding each item we apply additional condition filter to it. Even without profiler you can run it in console and see how it gradually slows. BUT a) it requires less code and b) it provides deferred execution, if your goal is not to iterate over whole integer range

Oleg I. 2010-02-09 10:43:07

@Aaronaught, yes on differed execution. You also have to make sure that you do something with it, because the compiler (I think) can optimize away code whose variables you never use. I computed the sum of the primes and timed that.

Isaac Cambron 2010-02-09 19:48:18

ansaurus

tags:

views:

answers:

Refactoring Fun - Prime Numbers

related questions