views:

290

answers:

3

I can't figure out why m1 is apparently memoized while m2 is not in the following:

m1      = ((filter odd [1..]) !!)

m2 n    = ((filter odd [1..]) !! n)

m1 10000000 takes about 1.5 seconds on the first call, and a fraction of that on subsequent calls (presumably it caches the list), whereas m2 10000000 always takes the same amount of time (rebuilding the list with each call). Any idea what's going on? Are there any rules of thumb as to if and when GHC will memoize a function? Thanks.

+17  A: 

GHC does not memoize functions.

It does, however, compute any given expression in the code at most once per time that its surrounding lambda-expression is entered, or at most once ever if it is at top level. Determining where the lambda-expressions are can be a little tricky when you use syntactic sugar like in your example, so let's convert these to equivalent desugared syntax:

m1' = (!!) (filter odd [1..])              -- NB: See below!
m2' = \n -> (!!) (filter odd [1..]) n

(Note: The Haskell 98 report actually describes a left operator section like (a %) as equivalent to \b -> (%) a b, but GHC desugars it to (%) a. These are technically different because they can be distinguished by seq. I think I might have submitted a GHC Trac ticket about this.)

Given this, you can see that in m1', the expression filter odd [1..] is not contained in any lambda-expression, so it will only be computed once per run of your program, while in m2', filter odd [1..] will be computed each time the lambda-expression is entered, i.e., on each call of m2'. That explains the difference in timing you are seeing.


Actually, some versions of GHC, with certain optimization options, will share more values than the above description indicates. This can be problematic in some situations. For example, consider the function

f = \x -> let y = [1..30000000] in foldl' (+) 0 (y ++ [x])

GHC might notice that y does not depend on x and rewrite the function to

f = let y = [1..30000000] in \x -> foldl' (+) 0 (y ++ [x])

In this case, the new version is much less efficient because it will have to read about 1 GB from memory where y is stored, while the original version would run in constant space and fit in the processor's cache. In fact, under GHC 6.12.1, the function f is almost twice as fast when compiled without optimizations than it is compiled with -O2.

Reid Barton
The cost to evaluate (filter odd [1..]) expression is close to zero anyway - it is lazy list after all, so the real cost is in (x !! 10000000) application when the list is actually evaluated. Besides, both m1 and m2 seems to be evaluated only once with -O2 and -O1 (on my ghc 6.12.3) at least within the following test: (test = m1 10000000 `seq` m1 10000000). There is a difference though when no optimization flag is specified. And both variants of your "f" have maximum residency of 5356 bytes regardless of optimization, by the way (with less total allocation when -O2 is used).
Ed'ka
@Ed'ka: Try this test program, with the above definition of `f`: `main = interact $ unlines . (show . map f . read) . lines`; compile with or without `-O2`; then `echo 1 | ./main`. If you write a test like `main = print (f 5)`, then `y` can be garbage collected as it is used and there is no difference between the two `f`s.
Reid Barton
er, that should be `map (show . f . read)`, of course. And now that I've downloaded GHC 6.12.3, I see the same results as in GHC 6.12.1. And yes, you are right about the original `m1` and `m2`: versions of GHC that perform this kind of lifting with optimizations enabled will transform `m2` into `m1`.
Reid Barton
@Reid Barton: Yes, now I see the difference (-O2 is definitely slower). Thank you for this example!
Ed'ka
+5  A: 

m1 is computed only once because it is a Constant Applicative Form, while m2 is not a CAF, and so is computed for each evaluation.

See the GHC wiki on CAFs: http://www.haskell.org/haskellwiki/Constant_applicative_form

sclv
The explanation “m1 is computed only once because it is a Constant Applicative Form” does not make sense to me. Because presumably both m1 and m2 are top-level variables, I think that these _functions_ are computed only once, no matter they are CAFs or not. The difference is whether the list `[1 ..]` is computed only once during the execution of a program or it is computed once per application of the function, but is it related to CAF?
Tsuyoshi Ito
From the linked page: "A CAF ... can either be compiled to a piece of graph which will be shared by all uses or to some shared code which will overwrite itself with some graph the first time it is evaluated". Since `m1` is a CAF, the second applies and `filter odd [1..]` (not just `[1..]`!) is computed only once. GHC could also note that `m2` refers to `filter odd [1..]`, and place a link to the same thunk used in `m1`, but that would be a bad idea: it could lead to large memory leaks in some situations.
Alexey Romanov
@Alexey: Thank you for the correction about `[1..]` and `filter odd [1..]`. For the rest, I am still unconvinced. If I am not mistaken, CAF is only relevant when we want to argue that a compiler _could_ replace the `filter odd [1..]` in `m2` by a global thunk (which may be even the same thunk as the one used in `m1`). But in the asker’s situation, the compiler did _not_ do that “optimization,” and I cannot see its relevance to the question.
Tsuyoshi Ito
It is relevant that it can replace it _in_ `m1`, and it does.
Alexey Romanov
+2  A: 

There is a crucial difference between the two forms: the monomorphism restriction applies to m1 but not m2, because m2 has explicitly given arguments. So m2's type is general but m1's is specific. The types they are assigned are:

m1 :: Int -> Integer
m2 :: (Integral a) => Int -> a

Most Haskell compilers and interpreters (all of them that I know of actually) do not memoize polymorphic structures, so m2's internal list is recreated every time it's called, where m1's is not.

mokus
Playing with these in GHCi it seems it is also depending on the let-floating transformation (one of GHC's optimization passes that is not used in GHCi). And of course when compiling these simple functions, the optimizer is able to make them behave identically anyway (according to some criterion tests I ran anyway, with the functions in a separate module and marked with NOINLINE pragmas). Presumably that's because the list generation and indexing gets fused into a super tight loop anyway.
mokus