views:

463

answers:

5

The following two Haskell programs for computing the n'th term of the Fibonacci sequence have greatly different performance characteristics:

fib1 n =
  case n of
    0 -> 1
    1 -> 1
    x -> (fib1 (x-1)) + (fib1 (x-2))

fib2 n = fibArr !! n where
  fibArr = 1:1:[a + b | (a, b) <- zip fibArr (tail fibArr)]

They are very close to mathematically identical, but fib2 uses the list notation to memoize its intermediate results, while fib1 has explicit recursion. Despite the potential for the intermediate results to be cached in fib1, the execution time gets to be a problem even for fib1 25, suggesting that the recursive steps are always evaluated. Does referential transparency contribute anything to Haskell's performance? How can I know ahead of time if it will or won't?

This is just an example of the sort of thing I'm worried about. I'd like to hear any thoughts about overcoming the difficulty inherent in reasoning about the performance of a lazily-executed, functional programming language.


Summary: I'm accepting 3lectrologos's answer, because the point that you don't reason so much about the language's performance, as about your compiler's optimization, seems to be extremely important in Haskell - more so than in any other language I'm familiar with. I'm inclined to say that the importance of the compiler is the factor that differentiates reasoning about performance in lazy, functional langauges, from reasoning about the performance of any other type.


Addendum: Anyone happening on this question may want to look at the slides from Johan Tibell's talk about high performance Haskell.

+9  A: 

In your particular Fibonacci example, it's not very hard to see why the second one should run faster (although you haven't specified what f2 is).

It's mainly an algorithmic issue:

  • fib1 implements the purely recursive algorithm and (as far as I know) Haskell has no mechanism for "implicit memoization".
  • fib2 uses explicit memoization (using the fibArr list to store previously computed values.

In general, it's much harder to make performance assumptions for a lazy language like Haskell, than for an eager one. Nevertheless, if you understand the underlying mechanisms (especially for laziness) and gather some experience, you will be able to make some "predictions" about performance.

Referential transparency increases (potentially) performance in (at least) two ways:

  • First, you (as a programmer) can be sure that two calls to the same function will always return the same, so you can exploit this in various cases to benefit in performance.
  • Second (and more important), the Haskell compiler can be sure for the above fact and this may enable many optimizations that can't be enabled in impure languages (if you've ever written a compiler or have any experience in compiler optimizations you are probably aware of the importance of this).

If you want to read more about the reasoning behind the design choices (laziness, pureness) of Haskell, I'd suggest reading this.

3lectrologos
the f2 thing was a bug - I entered the question from a different computer than I have haskell installed, allowing the error room to creep in.
Aidan Cully
It's posts like these that make me want to read that Dragon book to understand exactly what all the fuss about Haskell's compiler is all about.
wheaties
I'm starting to work through the History document you posted. I understand the point about that understanding the underlying mechanisms for the language are important in predicting performance... That's the question, though - what can be assumed about those underlying mechanisms? Do specifications exist? Or do we assume that these things can change? For example, it's easy to imagine a Haskell implementation that doesn't memoize anything. Would that be a valid implementation? Or can I assume that lists are memoized in all cases?
Aidan Cully
I imagine this is covered in the books, but it occurs to me that Haskell, or any purely functional language, *could* have implicit memoization. The hard part, as a compiler writer, would be knowing when to use it.
Dan
The Haskell specifications refer only to mechanisms that are implementation independent (e.g. referential transparency). Of course a compiler could use that to implement an implicit memoization mechanism, but this wouldn't be specified in a Haskell spec. One should read the specific compiler spec to really learn about the implementation details.
3lectrologos
In a compiled language like Haskell, it's very difficult to use dynamic optimization techniques (always memoizing results is clearly not an option), because you usually don't have enough information at compile time (it's also one of the reasons behind the popularity of virtual machines, which use dynamic profiling).
3lectrologos
So, basically, don't bother trying to predict Haskell's performance, beyond the worst possible case. Rather, concentrate on predicting the behavior of your chosen compiler.
Aidan Cully
Haskell may be "special" because of the wide variety of optimizations and pessimizations available to the compiler, but this lesson is true of all languages.
ephemient
+4  A: 

Reasoning about performance is generally hard in Haskell and lazy languages in general, although not impossible. Some techniques are covered in Chris Okasaki's Purely Function Data Structures (also available online in a previous version).

Another way to ensure performance is to fix the evaluation order, either using annotations or continuation passing style. That way you get to control when things are evaluated.

In your example you might calculate the numbers "bottom up" and pass the previous two numbers along to each iteration:

fib n = fib_iter(1,1,n)
    where
      fib_iter(a,b,0) = a
      fib_iter(a,b,1) = a
      fib_iter(a,b,n) = fib_iter(a+b,a,n-1)

This results in a linear time algorithm.

Whenever you have a dynamic algorithm where each result relies on the N previous results, you can use this technique. Otherwise you might have to use an array or something completely different.

Jørgen Fogh
`fib_iter(1,1,n)` -> `fib_iter 1 1 n` (I guess it would work with extra tupling, but it would be pointless and definately nonidiomatic.)
Michał Marczyk
I had been intending to read Okasaki's book. I didn't know that a) it was his thesis, and b) that was on-line. It certainly looks promising ("the notion of amortization also provides the first practical techniques for analyzing the time requirements of non-trivial lazy programs"). Thanks!
Aidan Cully
+2  A: 

Your implementation of fib2 uses memoization but each time you call fib2 it rebuild the "whole" result. Turn on ghci time and size profiling:

Prelude> :set +s

If it was doing memoisation "between" calls the subsequent calls would be faster and use no memory. Call fib2 20000 twice and see for yourself.

By comparison a more idiomatic version where you define the exact mathematical identity:

-- the infinite list of all fibs numbers.
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)

memoFib n = fibs !! n

actually do use memoisation, explicit as you see. If you run memoFib 20000 twice you'll see the time and space taken the first time then the second call is instantaneous and take no memory. No magic and no implicit memoization like a comment might have hinted at.

Now about your original question: optimizing and reasoning about performance in Haskell...

I wouldn't call myself an expert in Haskell, I have only been using it for 3 years, 2 of which at my workplace but I did have to optimize and get to understand how to reason somewhat about its performance.

As mentionned in other post laziness is your friend and can help you gain performance however YOU have to be in control of what is lazily evaluated and what is strictly evaluated.

Check this comparison of foldl vs foldr

foldl actually stores "how" to compute the value i.e. it is lazy. In some case you saves time and space beeing lazy, like the "infinite" fibs. The infinite "fibs" doesn't generate all of them but knows how. When you know you will need the value you might as well just get it "strictly" speaking... That's where strictness annotation are usefull, to give you back control.

I recall reading many times that in lisp you have to "minimize" consing.

Understanding what is stricly evaluated and how to force it is important but so is understanding how much "trashing" you do to the memory. Remember Haskell is immutable, that means that updating a "variable" is actually creating a copy with the modification. Prepending with (:) is vastly more efficient than appending with (++) because (:) does not copy memory contrarily to (++). Whenever a big atomic block is updated (even for a single char) the whole block needs to be copied to represent the "updated" version. The way you structure data and update it can have a big impact on performance. The ghc profiler is your friend and will help you spot these. Sure the garbage collector is fast but not having it do anything is faster!

Cheers

JFT
+1  A: 

Since allocation is a major cost in any functional language, an important part of understanding performance is to understand when objects are allocated, how long they live, when they die, and when they are reclaimed. To get this information you need a heap profiler. It's an essential tool, and luckily GHC ships with a good one.

For more information, read Colin Runciman's papers.

Norman Ramsey
+1  A: 

Aside from the memoization issue, fib1 also uses non-tailcall recursion. Tailcall recursion can be re-factored automatically into a simple goto and perform very well, but the recursion in fib1 cannot be optimized in this way, because you need the stack frame from each instance of fib1 in order to calculate the result. If you rewrote fib1 to pass a running total as an argument, thus allowing a tail call instead of needing to keep the stack frame for the final addition, the performance would improve immensely. But not as much as the memoized example, of course :)

Justin Smith