ansaurus

Question

Why does this first Haskell function FAIL to handle infinite lists, while this second snippet SUCCEEDS with infinite lists?

Answer 1

+3 A:

The second version does not actually evaluate result until after it has started producing part of its own answer. The first version evaluates result immediately by pattern matching on it.

The key with these infinite lists is that you have to produce something before you start demanding list elements so that the output can always "stay ahead" of the input.

(I feel like this explanation is not very clear, but it's the best I can do.)

Norman Ramsey 2009-05-12 04:14:20

OK, thanks. Follow-up question: It seems that everything that can be produced with (++) can also be produced with cons (at least in this example). However, if I understand correctly, cons will eval its right-most args first, whereas (++) won't. So when dealing with infinite lists, it seems sometimes we would need to choose (++) over cons. Is that correct?

Charlie Flowers 2009-05-12 16:19:47

@Charlie: I don't understand what you are trying to say. `head ((cycle [0])++undefined)` and `head (undefined:undefined)` both work just fine. (Of course, forcing the result of the second expression is undefined...)

ephemient 2009-05-12 17:13:21

Yes, I see now that you're right @ephemient. There is no material difference between cons and ++ in terms of evaluation time. I had a number of "guesses" of why my code wasn't handling infinite lists, most of which I think I've now debunked. I think the light bulb has come on now.

Charlie Flowers 2009-05-12 18:25:39

@Norman: I have corrected my code. The corrected code is now in the question, along with a list of conclusions I have reached. Would you please take a look and let me know if there are any incorrect conclusions there, or if you have anything to add? Thanks.

Charlie Flowers 2009-05-12 18:55:01

Answer 2

+6 A:

Try expanding the expression by hand:

 take 5 (myWords_FailsOnInfiniteList  (cycle "why "))
 take 5 (foldr step [] (dropWhile charIsSpace (cycle "why ")))
 take 5 (foldr step [] (dropWhile charIsSpace ("why " ++ cycle "why ")))
 take 5 (foldr step [] ("why " ++ cycle "why "))
 take 5 (step 'w' (foldr step [] ("hy " ++ cycle "why ")))
 take 5 (step 'w' (step 'h' (foldr step [] ("y " ++ cycle "why "))))

What's the next expansion? You should see that in order to pattern match for step, you need to know whether it's the empty list or not. In order to find that out, you have to evaluate it, at least a little bit. But that second term happens to be a foldr reduction by the very function you're pattern matching for. In other words, the step function cannot look at its arguments without calling itself, and so you have an infinite recursion.

Contrast that with an expansion of your second function:

myWords_anotherReader (cycle "why ")
foldr step [""] (cycle "why ")
foldr step [""] ("why " ++ cycle "why ")
step 'w' (foldr step [""] ("hy " ++ cycle "why ")
let result = foldr step [""] ("hy " ++ cycle "why ") in
    ['w':(head result)] ++ tail result
let result = step 'h' (foldr step [""] ("y " ++ cycle "why ") in
    ['w':(head result)] ++ tail result

You can probably see that this expansion will continue until a space is reached. Once a space is reached, "head result" will obtain a value, and you will have produced the first element of the answer.

I suspect that this second function will overflow for infinite strings that don't contain any spaces. Can you see why?

Apocalisp 2009-05-12 04:52:31

Makes sense, thanks. I had actually broken it down like this, but I was overlooking the fact that pattern matching itself was causing the second arg to be eval'd. And I do see why it would overflow for a string with no spaces. But, is there a way to implement myWords without using head or tail?

Charlie Flowers 2009-05-12 16:26:45

Have a look at the prelude's "span" function and see if you can rewrite that as a foldr.

Apocalisp 2009-05-12 18:41:12

I figured out how to do it without head and tail before I saw your response. But I will do the exercise you suggest. I appreciate the suggestion and am looking forward to it.

Charlie Flowers 2009-05-12 18:50:12

@Apocalisp, would you please take a look at the corrected code and the list of conclusions I've reached? I edited the question and added that information, and would love to find out if you feel I've missed anything or reached incorrect conclusions. Thanks.

Charlie Flowers 2009-05-12 18:53:45

Answer 3

+4 A:

Others have pointed out the problem, which is that step always evaluates its second argument before producing any output at all, yet its second argument will ultimately depend on the result of another invocation of step when the foldr is applied to an infinite list.

It doesn't have to be written this way, but your second version is kind of ugly because it relies on the initial argument to step having a particular format and it's quite hard to see that the head/tail will never go wrong. (I'm not even 100% certain that they won't!)

What you should do is restructure the first version so it produces output without depending on the input list in at least some situations. In particular we can see that when the character is not a space, there's always at least one element in the output list. So delay the pattern-matching on the second argument until after producing that first element. The case where the character is a space will still be dependent on the list, but that's fine because the only way that case can infinitely recurse is if you pass in an infinite list of spaces, in which case not producing any output and going into a loop is the expected behaviour for words (what else could it do?)

Ganesh Sittampalam 2009-05-12 05:05:11

Excellent. In particular, the guideline seems to be "delay the pattern-matching on the second argument until after producing that first element". I'm going to try that and will report back with the results.

Charlie Flowers 2009-05-12 16:28:33

Alright, Ganesh, I have given it a shot and I think I understand. Would you please look at my revised question and let me know if you think I'm incorrect about anything? Thanks.

Charlie Flowers 2009-05-12 18:51:08

Just to be clear on a point that I'm not sure you've grasped: matching the second argument against either [] or (x:xs) forces its evaluation a little bit (specifically, into either a [] or a cons :-). Your revised code works because it doesn't force the list if the character is a space.Just to add a bit more information into the mix, check out "lazy patterns", which are introduced with a ~. If you use one in your second line of definition, then it won't force the list either despite being a match against (x:xs).

Ganesh Sittampalam 2009-05-12 19:19:01

Ganesh, you're absolutely right. This was a key point for me, and I am grateful for you pointing it out. It is not correct for me to say that it is "perfectly fine" to match the 2nd arg against (x:xs), because there are cases when that would NOT be ok. It is only OK here because there are cases when matching against (x:xs) won't cause infinite recursion because it will lead to a space. Thanks!

Charlie Flowers 2009-06-23 18:07:43

Answer 4

+1 A:

The library function foldr has this implementation (or similar):

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f k (x:xs) = f x (foldr f k xs)
foldr _ k _ = k

The result of myWords_FailsOnInfiniteList depends on the result of foldr which depends on the result of step which depends on the result of the inner foldr which depends on ... and so on an infinite list, myWords_FailsOnInfiniteList will use up an infinite amount of space and time before producing its first word.

The step function in myWords_anotherReader does not require the result of the inner foldr until after it has produced the first letter of the first word. Unfortunately, as Apocalisp says, it uses O(length of first word) space before it produces the next word, because as the first word is being produced, the tail thunk keeps growing tail ([...] ++ tail ([...] ++ tail (...))).

In contrast, compare to

myWords :: String -> [String]
myWords = myWords' . dropWhile isSpace where
    myWords' [] = []
    myWords' string =
        let (part1, part2) = break isSpace string
        in part1 : myWords part2

using library functions which may be defined as

break :: (a -> Bool) -> [a] -> ([a], [a])
break p = span $ not . p

span :: (a -> Bool) -> [a] -> ([a], [a])
span p xs = (takeWhile p xs, dropWhile p xs)

takeWhile :: (a -> Bool) -> [a] -> [a]
takeWhile p (x:xs) | p x = x : takeWhile p xs
takeWhile _ _ = []

dropWhile :: (a -> Bool) -> [a] -> [a]
dropWhile p (x:xs) | p x = dropWhile p xs
dropWhile _ xs = xs

Notice that producing the intermediate results is never held up by future computation, and only O(1) space is needed as each element of the result is made available for consumption.

Addendum

So, here's the revised code. I usually try to avoid head and tail, merely because they are partial functions, and also because I need practice writing the pattern matching equivalent.
myWords :: String -> [String]
myWords string = foldr step [""] (dropWhile charIsSpace string)
   where 
      step space acc | charIsSpace space = "":acc
      step char (x:xs)                   = (char:x):xs
      step _ []                          = error "this should be impossible"

(Aside: You may not care, but the words "" == [] from the library, but your myWords "" = [""]. Similar issue with trailing spaces.)

Looks much-improved over myWords_anotherReader, and is pretty good for a foldr-based solution.

\n -> tail $ myWords $ replicate n 'a' ++ " b"

It's not possible to do better than O(n) time, but both myWords_anotherReader and myWords take O(n) space here. This may be inevitable given the use of foldr.

Worse,

\n -> head $ head $ myWords $ replicate n 'a' ++ " b"

myWords_anotherReader was O(1) but the new myWords is O(n), because pattern matching (x:xs) requires the further result.

You can work around this with

myWords :: String -> [String]
myWords = foldr step [""] . dropWhile isSpace
   where 
      step space acc | isSpace space = "":acc
      step char ~(x:xs)              = (char:x):xs

The ~ introduces an "irrefutable pattern". Irrefutable patterns never fail and do not force immediate evaluation.

ephemient 2009-05-12 15:35:43

Yes, this makes sense. I was actually working on an exercise that specifically wanted me to implement it using foldr. I had implemented it with break and explicit recursion before that.

Charlie Flowers 2009-05-12 16:33:57

ephemient, I'd love to get your thoughts and any corrections to the revised code and the conclusions I've reached. I edited my question with that info and would love to hear any feedback you have.

Charlie Flowers 2009-05-12 18:52:16

ansaurus

tags:

views:

answers:

Why does this first Haskell function FAIL to handle infinite lists, while this second snippet SUCCEEDS with infinite lists?

Addendum

related questions