ansaurus

Question

Guidelines for applying DRY in Haskell function definitions

Answer 1

+2 A:

I think the way you've done it makes sense.

You should certainly always break repeated computations out into separately defined values if avoiding repeated computation is important, but in this case that doesn't look necessary. Nevertheless, the broken out values have easy to understand names, so they make your code easier to follow. I don't think the fact that your code is a bit longer as a result is a bad thing.

BTW, instead hardcoding the maximum Int, you can use (maxBound :: Int) which avoids the risk of you making a mistake or another implementation with a different maximum Int breaking your code.

Ganesh Sittampalam 2009-05-06 05:02:26

Cool, thanks for the response. I knew there was probably something like maxBound :: Int, but the book I'm using hasn't covered it yet so I just used the interpreter to find the breaking point and hardcoded the max.

Charlie Flowers 2009-05-06 05:08:15

Answer 2

+7 A:

DRY is just as good of a principle in Haskell as it is anywhere else :) A lot of the reason behind the terseness you speak of in haskell is that many idioms are lifted out into libraries, and that often those examples you look at have been considered very carefully to make them terse :)

For example, here's an alternate way to implement your digit-to-string algorithm:

asInt_fold ('-':n) = negate (asInt_fold n)
asInt_fold "" = error "Need some actual digits!"
asInt_fold str = foldl' step 0 str
    where
        step _ x
            | x < '0' || x > '9'
            = error "Bad character somewhere!"
        step sum dig =
            case sum * 10 + digitToInt dig of
                n | n < 0 -> error "Overflow!"
                n -> n

A few things to note:

We detect overflow when it happens, not by deciding arbitrary-ish limits on what digits we allow. This signifigantly simplifies the overflow detection logic - and makes it work on any integer type from Int8 to Integer [as long as overflow results in wraparound, doesn't occur, or results in an assertion from the addition operator itself]
By using a different fold, we don't need two seperate states.
No repeating ourselves, even without going out of our way to lift things out - it falls naturally out of re-stating what we're trying to say.

Now, it's not always possible to just reword the algorithm and make the duplication go away, but it's always useful to take a step back and reconsider how you've been thinking about the problem :)

bdonlan 2009-05-06 05:10:13

Very helpful. I had already found out from other people's comments online that foldl would be better, because then the accumulator could be a simple Int. I did not know that you could "name" the result of a case expression the way you did with "n". Is there a case where the overflow is so large that you would not get a negative number (even though the number would still be incorrect)?

Charlie Flowers 2009-05-06 06:03:46

In twos-complement, if the overflow was so large you wouldn't get a negative number, then newPlaceComponent would also wraparound. You really need to check for overflow whilst doing the exponentiation to guard against that. Or use Integer for the intermediate computations and do an overflow check before converting to Int. Obviously there are efficiency trade-offs here.

Ganesh Sittampalam 2009-05-06 07:08:26

Answer 3

+2 A:

As noted by bdonlan, your algorithm could be cleaner---it's especially useful that the language itself detects overflow. As for your code itself and the style, I think the main tradeoff is that each new name imposes a small cognitive burden on the reader. When to name an intermediate result becomes a judgment call.

I personally would not have chosen to name placeMultiplier, as I think the intent of place ^ 10 is much clearer. And I would look for maxInt in the Prelude, as you run the risk of being terribly wrong if run on 64-bit hardware. Otherwise, the only thing I find objectionable in your code are the redundant parentheses. So what you have is an acceptable style.

(My credentials: At this point I have written on the order of 10,000 to 20,000 lines of Haskell code, and I have read perhaps two or three times that. I also have ten times that much experience with the ML family of languages, which require the programmer to make similar decisions.)

Norman Ramsey 2009-05-07 21:43:42

Thanks. I was surprised that so far no one argued for the other side. Clearly there are drawbacks to large numbers of variables. Haskell in particular seems to value small code size, and I think for good reason. Thanks very much for your feedback.

Charlie Flowers 2009-05-08 01:06:29

So, in this example, you see a kind of trade-off between DRY and terseness, yes? Both good values, and in this particular case, they begin to compete with each other. Is that a fair summary of your point of view?

Charlie Flowers 2009-05-08 01:07:51

ansaurus

tags:

views:

answers:

Guidelines for applying DRY in Haskell function definitions

related questions