tags:

views:

302

answers:

7

I'm a beginner to functional languages, and I'm trying to get the whole thing down in Haskell. Here's a quick-and-dirty function that finds all the factors of a number:

factors :: (Integral a) => a -> [a]
factors x = filter (\z -> x `mod` z == 0) [2..x `div` 2]

Works fine, but I found it to be unbearably slow for large numbers. So I made myself a better one:

factorcalc :: (Integral a) => a -> a -> [a] -> [a]
factorcalc x y z
    | y `elem` z      = sort z
    | x `mod` y == 0  = factorcalc x (y+1) (z ++ [y] ++ [(x `div` y)])
    | otherwise       = factorcalc x (y+1) z

But here's my problem: Even though the code works, and can cut literally hours off the execution time of my programs, it's hideous!

It reeks of ugly imperative thinking: It constantly updates a counter and a data structure in a loop until it finishes. Since you can't change state in purely functional programming, I cheated by holding the data in the parameters, which the function simply passes to itself over and over again.

I may be wrong, but there simply must be a better way of doing the same thing...

A: 

I don't know much about Haskell, but somehow I think this link is appropriate:

http://www.willamette.edu/~fruehr/haskell/evolution.html

Edit: I'm not entirely sure why people are so aggressive about the downvoting on this. The original poster's real problem was that the code was ugly; while it's funny, the point of the linked article is, to some extent, that advanced Haskell code is, in fact, ugly; the more you learn, the uglier your code gets, to some extent. The point of this answer was to point out to the OP that apparently, the ugliness of the code that he was lamenting is not uncommon.

McWafflestix
That page is about factorials, not factors. It is funny, though :)
Stephan202
Some people have no sense of humor. I wouldn't worry too much about it.
Telemachus
not exactly worried; more just vaguely annoyed... :-)
McWafflestix
+4  A: 

Okay, take a deep breath. It'll be all right.

First of all, why is your first attempt slow? How is it spending its time?

Can you think of a recursive definition for the prime factorization that doesn't have that property?

(Hint.)

Charlie Martin
Note that he wants a list of all factors, not just the primes.
Stephan202
@Stephan202: Given the prime factorization, it is easy to compute a list of all factors.
Greg Hewgill
@Greg: show us the code!
Norman Ramsey
@Norman: You just multiply every combination of the prime factors together and that will give you all the factors. Consider that pseudocode.
Chuck
This should be fun. @Stephan, usually "factor" means to factor to the irreducible pieces, ie (for integers) primes.
Charlie Martin
Well, hell, I shouldn't post just before going to bed. You're right, his "code that works" is getting *all* the factors, isn't it?
Charlie Martin
@Charlie yes, it does. Usually we call these *divisors*, but 'factors' is also a valid term: http://en.wikipedia.org/wiki/Divisibility
Stephan202
yeah, yeah, like I said it was just too late.
Charlie Martin
+1  A: 

Firstly, although factorcalc is "ugly", you could add a wrapper function factors' x = factorscalc x 2 [], add a comment, and move on.

If you want to make a 'beautiful' factors fast, you need to find out why it is slow. Looking at your two functions, factors walks the list about n/2 elements long, but factorcalc stops after around sqrt n iterations.

Here is another factors that also stops after about sqrt n iterations, but uses a fold instead of explicit iteration. It also breaks the problem into three parts: finding the factors (factor); stopping at the square root of x (small) and then computing pairs of factors (factorize):

factors' :: (Integral a) => a -> [a]
factors' x = sort (foldl factorize [] (takeWhile small (filter factor [2..])))
  where
    factor z = x `mod` z == 0
    small z = z <= (x `div` z)
    factorize acc z = z : (if z == y then acc else y : acc)
      where y = x `div` z

This is marginally faster than factorscalc on my machine. You can fuse factor and factorize and it is about twice as fast as factorscalc.

The Profiling and Optimization chapter of Real World Haskell is a good guide to the GHC suite's performance tools for tackling tougher performance problems.

By the way, I have a minor style nitpick with factorscalc: it is much more efficient to prepend single elements to the front of a list O(1) than it is to append to the end of a list of length n O(n). The lists of factors are typically small, so it is not such a big deal, but factorcalc should probably be something like:

factorcalc :: (Integral a) => a -> a -> [a] -> [a]
factorcalc x y z
    | y `elem` z      = sort z
    | x `mod` y == 0  = factorcalc x (y+1) (y : (x `div` y) : z)
    | otherwise       = factorcalc x (y+1) z
Dominic Cooney
+5  A: 

Note that the original question asked for all the factors, not for only the prime factors. There being many fewer prime factors, they can probably be found more quickly. Perhaps that's what the OQ wanted. Perhaps not. But let's solve the original problem and put the "fun" back in "functional"!

Some observations:

  • The two functions don't produce the same output---if x is a perfect square, the second function includes the square root twice.

  • The first function enumerates checks a number of potential factors proportional to the size of x; the second function checks only proportional to the square root of x, then stops (with the bug noted above).

  • The first function (factors) allocates a list of all integers from 2 to n div 2, where the second function never allocates a list but instead visits fewer integers one at a time in a parameter. I ran the optimizer with -O and looked at the output with -ddump-simpl, and GHC just isn't smart enough to optimize away those allocations.

  • factorcalc is tail-recursive, which means it compiles into a tight machine-code loop; filter is not and does not.

Some experiments show that the square root is the killer:

Here's a sample function that produces the factors of x from z down to 2:

factors_from x 1 = []
factors_from x z 
  | x `mod` z == 0 = z : factors_from x (z-1)
  | otherwise      =     factors_from x (z-1)

factors'' x = factors_from x (x `div` 2)

It's a bit faster because it doesn't allocate, but it's still not tail-recursive.

Here's a tail-recursive version that is more faithful to the original:

factors_from' x 1 l = l
factors_from' x z l
  | x `mod` z == 0 = factors_from' x (z-1) (z:l)
  | otherwise      = factors_from' x (z-1) l

factors''' x = factors_from x (x `div` 2)

This is still slower than factorcalc because it enumerates all the integers from 2 to x div 2, whereas factorcalc stops at the square root.

Armed with this knowledge, we can now create a more functional version of factorcalc which replicates both its speed and its bug:

factors'''' x = sort $ uncurry (++) $ unzip $ takeWhile (uncurry (<=)) $ 
                [ (z, x `div` z) | z <- [2..x], x `mod` z == 0 ]

I didn't time it exactly, but given 100 million as an input, both it and factorcalc terminate instantaneously, where the others all take a number of seconds.

How and why the function works is left as an exercise for the reader :-)


ADDENDUM: OK, to mitigate the eyeball bleeding, here's a slightly saner version (and without the bug):

saneFactors x = sort $ concat $ takeWhile small $
                [ pair z | z <- [2..], x `mod` z == 0 ]
    where pair z = if z * z == x then [z] else [z, x `div` z]
          small [z, z'] = z < z'
          small [z]     = True
Norman Ramsey
My eyes hurt.@_@
Rayne
I would really like to start a contest like "Is this Haskell code or did I copy and paste random CS terms from Wikipedia while banging on my punctuation keys?"
Chuck
+1  A: 
Curt Sampson
+1  A: 

This seemed like an interesting problem, and I hadn't coded any real Haskell in a while, so I gave it a crack. I've run both it and Norman's factors'''' against the same values, and it feels like mine's faster, though they're both so close that it's hard to tell.

factors :: Int -> [Int]
factors n = firstFactors ++ reverse [ n `div` i | i <- firstFactors ]
 where
 firstFactors = filter (\i -> n `mod` i == 0) (takeWhile ( \i -> i * i <= n ) [2..n])

Factors can be paired up into those that are greater than sqrt n, and those that are less than or equal to (for simplicity's sake, the exact square root, if n is a perfect square, falls into this category. So if we just take the ones that are less than or equal to, we can calculate the others later by doing div n i. They'll be in reverse order, so we can either reverse firstFactors first or reverse the result later. It doesn't really matter.

Samir Talwar
The square root's the thing. I bet yours does less allocation than mine.
Norman Ramsey
A: 

This is my "functional" approach to the problem. ("Functional" in quotes, because I'd approach this problem the same way even in non-functional languages, but maybe that's because I've been tainted by Haskell.)

{-# LANGUAGE PatternGuards #-}
factors :: (Integral a) => a -> [a]
factors = multiplyFactors . primeFactors primes 0 [] . abs where
    multiplyFactors [] = [1]
    multiplyFactors ((p, n) : factors) =
        [ pn * x
        | pn <- take (succ n) $ iterate (* p) 1
        , x <- multiplyFactors factors ]
    primeFactors _ _ _ 0 = error "Can't factor 0"
    primeFactors (p:primes) n list x
        | (x', 0) <- x `divMod` p
        = primeFactors (p:primes) (succ n) list x'
    primeFactors _ 0 list 1 = list
    primeFactors (_:primes) 0 list x = primeFactors primes 0 list x
    primeFactors (p:primes) n list x
        = primeFactors primes 0 ((p, n) : list) x
    primes = sieve [2..]
    sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p /= 0]

primes is the naive Sieve of Eratothenes. There's better, but this is the shortest method.

   sieve [2..]
=> 2 : sieve [x | x <- [3..], x `mod` 2 /= 0]
=> 2 : 3 : sieve [x | x <- [4..], x `mod` 2 /= 0, x `mod` 3 /= 0]
=> 2 : 3 : sieve [x | x <- [5..], x `mod` 2 /= 0, x `mod` 3 /= 0]
=> 2 : 3 : 5 : ...

primeFactors is the simple repeated trial-division algorithm: it walks through the list of primes, and tries dividing the given number by each, recording the factors as it goes.

   primeFactors (2:_) 0 [] 50
=> primeFactors (2:_) 1 [] 25
=> primeFactors (3:_) 0 [(2, 1)] 25
=> primeFactors (5:_) 0 [(2, 1)] 25
=> primeFactors (5:_) 1 [(2, 1)] 5
=> primeFactors (5:_) 2 [(2, 1)] 1
=> primeFactors _ 0 [(5, 2), (2, 1)] 1
=> [(5, 2), (2, 1)]

multiplyPrimes takes a list of primes and powers, and explodes it back out to a full list of factors.

   multiplyPrimes [(5, 2), (2, 1)]
=> [ pn * x
   | pn <- take (succ 2) $ iterate (* 5) 1
   , x <- multiplyPrimes [(2, 1)] ]
=> [ pn * x | pn <- [1, 5, 25], x <- [1, 2] ]
=> [1, 2, 5, 10, 25, 50]

factors just strings these two functions together, along with an abs to prevent infinite recursion in case the input is negative.

ephemient