views:

213

answers:

5

I am new to both Haskell and programming. My question about binding in pattern-matched, recursive functions. For instance, suppose I have a function which checks whether a given list (x:xs) is a sublist of another list, (y:ys). My initial thought, following the examples in my textbook, was:

sublist [] ys = True
sublist xs [] = False
sublist (x:xs) (y:ys)
   | x == y = sublist xs ys
   | x /= y = sublist (x:xs) ys

This works on test data, e.g.,

sublist [1, 2, 3] [1, 2, 4, 1, 2, 3]

where I expected it to fail. I expect it to fail, since

sublist [1, 2, 3] [1, 2, 4, 1, 2, 3]
   = sublist [2, 3] [2, 4, 1, 2, 3]
   = sublist [3] [4, 1, 2, 3]

at which point, I thought, [3] = 3:[] will be matched with (x:xs) in sublist, and [4, 1, 2, 3] will be matched with (y:ys) in sublist. How, then, is sublist working?

Edit: Thanks to everyone here, I think I've solved my problem. As noted, I was ("subconsciously") wanting sublist to backtrack for me. Using the last answer (BMeph) posted as a guide, I decided to approach the problem differently, in order to solve the "binding problem," i.e., the "backtracking" problem.

subseq :: (Eq a) => [a] -> [a] -> Bool
subseq [] _ = True
subseq _ [] = False
subseq (x:xs) (y:ys) =

-- subseq' decides whether the list bound to (x:xs) = M is a prefix of the list
-- bound to L = (y:ys); it recurses through L and returns a Bool value. subseq
-- recurses through M and L, returning a disjunction of Bool
-- values. Each recursive call to subseq passes M and ys to subseq', which
-- decides whether M is a prefix of the **current list bound to ys**.

   let subseq' :: (Eq a) => [a] -> [a] -> Bool
       subseq' [] _ = True
       subseq' _ [] = False
       subseq' (x:xs) (y:ys) = (x == y) && subseq' xs ys
          in subseq' (x:xs) (y:ys) || subseq (x:xs) ys
+8  A: 
  sublist [1, 2, 3] [1, 2, 4, 1, 2, 3]
= sublist [2, 3] [2, 4, 1, 2, 3]
= sublist [3] [4, 1, 2, 3]
= sublist [3] [4, 1, 2, 3]
= sublist (3:[]) (4:[1,2,3])     -- Since 3 /= 4, we take sublist (x:xs) ys
= sublist (3:[]) [1,2,3]
= sublist (3:[]) (1:[2,3])
= sublist (3:[]) [2,3]
= sublist (3:[]) (2:[3])
= sublist (3:[]) [3]
= sublist [] []
= True

sublist checks if heads of lists are equal. If yes, then it removes them and proceeds (sublist xs ys). If no, it removes head from the second list (sublist (x:xs) ys). This way it "finds" the following association:

 1 2 3
 | | |
 | | \-----\
 | |       |
 1 2 4 1 2 3

In other words, to check sublist [1,2,3] ys for some list ys it pops elements from ys as long as they are not 1. Then it pops elements as long as they are not 2. Then it pops elements as long as they are not 3. If [1,2,3] is exhausted, then it reports True; if ys is exhausted, it reports False.

sdcvvc
Yes, this makes sense. My "sublist" function is operating like a set membership function, e.g., 1, 2, 3 are members of {1, 2, 4, 1, 2, 3} (although lists obviously may contain duplicate elements).
danportin
It's not exactly set membership, for example 1,2 are members of {2,1} but sublist [1,2] [2,1] returns False. See http://en.wikipedia.org/wiki/Subsequence.
sdcvvc
+11  A: 

It is working because:

  • [3] is matched as x:xs as 3:[],
  • [4, 1, 2, 3] is matched as y:ys as 4:[1,2,3]
  • 3/=4 so sublist (x:xs) ys is evaluated, which eventually is True

trace:

sublist [1, 2, 3] [1, 2, 4, 1, 2, 3]
   = sublist [2, 3] [2, 4, 1, 2, 3]
   = sublist [3] [4, 1, 2, 3]
   = sublist [3] [1, 2, 3]
   = sublist [3] [2, 3]
   = sublist [3] [3]
   = sublist [] [] = True
YuppieNetworking
+1  A: 

YuppieNetworking and sdcwc have already explained how the matching works. So your sublist searches for a sublist in the same sense as subsequence (not necessarily the same items in a row, there can be anything in between).

I want to note that you can often avoid explicit recursion to remove unnecessary items from the front of the list with dropWhile. Also, I wanted to give an example how to check if two lists have the same prefixes (you need this to check if the second list contains items of the first one in a row).

The first example is similar to your function, it allows for items in between, but it uses dropWhile to remove items in front of ys:

-- Test:
-- map ("foo" `subListOf`) ["kungfoo", "f-o-o!", "bar"] == [True,True,False]

[] `subListOf` _ = True
(x:xs) `subListOf` ys =
  let ys' = dropWhile (/= x) ys     -- find the first x in ys
  in  case ys' of
      (_:rest) -> xs `subListOf` rest
      [] -> False

The second example looks for a "dense" sublist:

-- Test:
-- map ("foo" `denseSubListOf`) ["kungfoo!", "-f-o-o-"] == [True,False]

[] `denseSubListOf` _ = True
_  `denseSubListOf` [] = False
xs `denseSubListOf` ys =
  let ps = zip xs ys in
  (length ps == length xs && all (uncurry (==)) ps) -- same prefix of xs and ys
  || xs `denseSubListOf` (tail ys)                  -- or search further

Please note that to check that the second list contains all the first one in the beginning I compare elements pair by pair (I zip them together to do it).

It is easier to explain by example:

zip [1,2,3] [1,2,3,4,5]  -- gives [(1,1), (2,2), (3,3)], 4 and 5 are truncated
uncurry (==)             -- an equivalent of (\(a,b) -> a == b)
all                      -- gives True iff (uncurry (==)) is True for all pairs
length ps == length xs   -- this is to ensue that the right list is long enough
jetxee
+3  A: 

Debug.Trace is your friend. With sublist instrumented as

sublist [] ys = trace ("A: [] "           ++ show ys) True
sublist xs [] = trace ("B: " ++ (show xs) ++ " []")   False
sublist (x:xs) (y:ys)
   | x == y = trace (info "C" "==")
              sublist xs ys
   | x /= y = trace (info "D" "/=")
              sublist (x:xs) ys
   where info tag op =
           tag ++ ": " ++ (show x) ++ " " ++ op ++ " " ++ (show y) ++
           "; xs=" ++ (show xs) ++ ", ys=" ++ show ys

you see what's happening, namely that it's repeatedly throwing away the head of the second list until it finds a match:

*Main> sublist [1, 2, 3] [1, 2, 4, 1, 2, 3]
C: 1 == 1; xs=[2,3], ys=[2,4,1,2,3]
C: 2 == 2; xs=[3], ys=[4,1,2,3]
D: 3 /= 4; xs=[], ys=[1,2,3]
D: 3 /= 1; xs=[], ys=[2,3]
D: 3 /= 2; xs=[], ys=[3]
C: 3 == 3; xs=[], ys=[]
A: [] []
True

Another tool that will help you implement sublist correctly is Test.QuickCheck, a library that automatically creates test data for use in verifying properties that you specify.

For example, say you want sublist to treat xs and ys as sets and determine whether the former is a subset of the latter. We can use Data.Set to specify this property:

prop_subsetOf xs ys =
  sublist xs ys == fromList xs `isSubsetOf` fromList ys
  where types = (xs :: [Int], ys :: [Int])

This says sublist should be equivalent to the definition on the right. The prop_ prefix is a popular convention for naming test properties to be used with QuickCheck.

Running it also identifies a failure case:

*Main> quickCheck prop_subsetOf 
*** Failed! Falsifiable (after 6 tests):                  
[0,0]
[0]
Greg Bacon
+2  A: 

I think where you may be misunderstanding, is that (when you first wrote the function) you assumed that in your check x /= y = sublist (x:xs) ys you (sub-consciously?) assumed that the function would backtrack and re-do your function with the tail of the original second list, not the tail of whichever piece of the list you're using when it doesn't match.

One nice (and unsettling) thing about Haskell is just how ridiculously powerful it is.

For an example, here's how what you wanted should have looked:

sublist   []     ys   = True
sublist   xs     []   = False
sublist (x:xs) (y:ys) |   x /= y  = False
                      | otherwise = sublist xs ys || sublist (x:xs) ys

which will check for all of the pieces of the first list. The "official" definition for the function (look up "isInfixOf" in your documentation) has a few extra functions that basically means the same thing.

Here's another way write it, that looks more "explanatory" to my eyes:

sublist [] ys = True
sublist xs [] = False
sublist xs ys =  (xs == take (length xs) ys) || sublist xs (tail ys)
BMeph
This was incredibly helpful. However, in the first piece of code, won't "sublist list_1 list_2" evaluate to False if (x /= y) evaluates to True, i.e., the function won't recurse properly if x and y aren't equivalent?
danportin