views:

127

answers:

2
test :: [String] -> [String]

test = foldr step []

  where step x ys

          | elem x ys = x : ys

          | otherwise = ys

I am trying to build a new list consisting of all the distinct strings being input. My test data is:

test ["one", "one", "two", "two", "three"]

expected result:

["one", "two", "three"]

I am new to Haskell, and I am sure that I am missing something very fundamental and obvious, but have run out of ways to explore this. Could you provide pointers to where my thinking is deficient?

The actual response is []. It seems that the first guard condition is never met (if I replace it with True, the original list is replicated), so the output list is never built.

My understanding was that the fold would accumulate the result of step on each item of the list, adding it to the empty list. I anticipated that step would test each item for its inclusion in the output list (the first element tested not being there) and would add anything that was not already included to the output list. Obviously not :-)

+7  A: 

Your reasoning is correct: you just need to switch = x : ys and = ys so that you add the x when it's not an element of ys. Also, Data.List.nub does this exact thing.

Travis Brown
+7  A: 

Think about it: your code is saying "when x is in the remainder, prepend x to the result", i.e. creating a duplicate. You just need to change it to "when x is not in the remainder, prepend x to the result" and you get the correct function.

This function differs from Data.List.nub in an important way: this function is more strict. Thus:

test [1..] = _|_   -- infinite loop (try it)
nub [1..] = [1..]

nub gives the answer correctly for infinite lists -- this means that it doesn't need the whole list to start computing results, and thus it is a nice player in the stream processing game.

The reason it is strict is because elem is strict: it searches the whole list (presuming it doesn't find a match) before it returns a result. You could write that like this:

nub :: (Eq a) => [a] -> [a]
nub = go []
    where
    go seen [] = []
    go seen (x:xs) | x `elem` seen = go seen xs
                   | otherwise     = x : go (x:seen) xs

Notice how seen grows like the output so far, whereas yours grows like the remainder of the output. The former is always finite (starting at [] and adding one at a time), whereas the latter may be infinite (eg. [1..]). So this variant can yield elements more lazily.

This would be faster (O(n log n) instead of O(n^2)) if you used a Data.Set instead of a list for seen. But it adds an Ord constraint.

luqui
I keep coming back to this response - as I continue with my attempts to learn Haskell, it is giving me deeper insights into the power of Haskell.
Chris Walton