tags:

views:

236

answers:

3

I am solving some problems of Project Euler in Haskell. I wrote a program for a riddle in it and it did not work as i expected. When I looked in the task manager when running the program I saw that i was using >1 gigabyte of RAM on ghc. A friend of me wrote a program with the same meaning in Java and succeeded in 7 seconds.

import Data.List
opl = find vw $ map (\x-> fromDigits (x++[0,0,9]) ) $ sequence [[1],re,[2],re,[3],re,[4],re,[5],re,[6],re,[7],re,[8],re]

vw x = hh^2 == x
    where hh = (round.sqrt.fromIntegral) x

re = [0..9]

fromDigits x = foldl1 (\n m->10*n+m) x

I know this program would output the number I want given enough RAM and time, but there has to be a better-performing way.

A: 

EDIT: I think I'm wrong here - changing the type signature to :: Maybe Word64 (which would be enough bits for this problem I think) also takes forever / has a space leak, so it couldn't be the old Integer bug.

Your problem seems to be an old GHC bug (that I thought was fixed) with Integer causing a space leak. The below code finishes in about 150 ms when compiled with -O2.

import Data.List
import Data.Word

main = print opl

opl :: Maybe Word32
opl = find vw $ map (\x-> fromDigits (x++[0,0,9]) ) $ sequence [[1],re,[2],re,[3],re,[4],re,[5],re,[6],re,[7],re,[8],re]

vw x = hh^2 == x
    where hh = (round.sqrt.fromIntegral) x

re = [0..9]

fromDigits x = foldl1 (\n m->10*n+m) x
TomMD
A: 

Since you're looking for a nineteen-digit number with those characteristics found in vw, I'd try to simplify the construction in the mapped function just say fromDigits x*1000+9 for starters. Appending to a list is O(length-of-the-left-list), so throwing those last three digits on the end hurts the computation time a bunch.

As an aside (to you both), using the strict version of the fold (foldl1') will also help.

BMeph
Sorry BMeph but that isn't right. foldl1' does nothing because -O2 catches the strictness. Also, the *1000+9 doesn't make any measureable difference - I tested that before I posted... premature optimization is the (blah blah blah)...
TomMD
+16  A: 

The main problem here is that sequence has a space leak. It is defined like this:

sequence [] = [[]]
sequence (xs:xss) = [ y:ys | y <- xs, ys <- sequence xss ]

so the problem is that the list produced by the recursive call sequence xss is re-used for each of the elements of xs, so it can't be discarded until the end. A version without the space leak is

myseq :: [[a]] -> [[a]]
myseq xs = go (reverse xs) []
 where
  go [] acc = [acc]
  go (xs:xss) acc = concat [ go xss (x:acc) | x <- xs ]

PS. the answer seems to be Just 1229314359627783009

Edit version avoiding the concat:

seqlists :: [[a]] -> [[a]]
seqlists xss = go (reverse xss) [] []
 where
   go []       acc rest = acc : rest
   go (xs:xss) acc rest = foldr (\y r -> go xss (y:acc) r) rest xs

note that both of these versions generate the results in a different order from the standard sequence, so while they work for this problem we can't use one as a specialised version of sequence.

Simon Marlow
Ahh, I didn't know sequence leaked, thanks Simon.
TomMD
Thanks, this was exactly what I needed.The only thing I'm still wondering is why the space leak is still there when I execute opl in ghci, and it isn't there when I compile it to an executable and run it from there.
Ingdas
How would I find this through profiling? I spent some time profiling with `-hy` to find out that `[]` was using most of the space; but how could I have found that `sequence` was the problem?
Jason Dusek
Without profiling, the new `sequence` allows the program to run in just under 2 minutes on a dual core, 64 bit laptop (1.3-GHz Intel U7300).
Jason Dusek
Profiling would tell you that the leak was in sequence and/or the definition of (>>=) for lists, and that the leak is composed of list cells. Beyond that, you have to reason about how sequence on lists works to understand why it leaks. However, note that this leak is not caused by lazy evaluation - the same function in a strict language would leak in the same way.The leak is not inherent in sequence or the list monad, but in the combination of the two. Maybe we can specialise sequence on lists to a non-leaky version, I'll think about that.
Simon Marlow