views:

448

answers:

5

I'm working through Real World Haskell, and at the moment doing the exercises at the end of Chapter 3.

I'm taking an unusual approach: even though I know there are some language features they haven't covered yet that would help me, I am trying to do these exercises using only things they have explicitly covered. Why? Just kind of for fun really. It feels like it is forcing me to give my brain some extra practice with recursion.

So I just completed the exercise posed as follows: "Create a function that sorts a list of lists based on the length of each sublist. (You may want to look at the sortBy function from the Data.List module.)"

Now, they threw in the hint about the Data.List module. But they didn't say a word about where reference doc can be found, about how to import stuff, etc. So I decided to roll my own sort just to see if I could do it. I used Bubble Sort since it's the simplest algorithm.

The result is below. I'd like to get you Haskell gurus to critique it please ... but with the following caveat in mind: If you suggest improvements, please base them on language features covered through chapter 3 of Real World Haskell (or what you guess those features might be without going to the trouble of looking it up). I know there are tons of awesome language features waiting for me that will let me make this code better, but right now the specific challenge was to do it with the "primitive" features covered so far.

I'm sure there are cases where I'm reaching around my shoulder to scratch my elbow, cases where I'm using explicit control flow when recursion and pattern matching could do more for me, etc. I'm sure the code could be made much shorter and more readable too. I bet there are good idioms I don't know about that can be used with the primitive language features I'm limiting myself to. Those are the kinds of tips I'd love to receive.

This is probably the ugliest code I'm proud of in any language (at least, that I can remember). My first stab, in a functional language, at something beyond "Hello, world" type stuff. And now you are going to beat the crap out of it :) . Be gentle, please, but I'm looking forward to some meaty insight. Thanks.

areListsEqual :: (Eq a) => [a] -> [a] -> Bool

areListsEqual [] [] = True
areListsEqual [] _  = False
areListsEqual _ []  = False

areListsEqual xs ys = (head xs == head ys)  && (areListsEqual (tail xs) (tail ys))

charlieSort :: (Eq a) => [[a]] -> [[a]]

charlieSort [] = []
charlieSort (x:xs) | null xs = [x]
charlieSort xs | (length xs) >= 2 = if(not (areListsEqual xs wip))
        then charlieSort wip 
        else wip
        where
          first = head xs
          second = head (tail xs)
          theRest = drop 2 xs
          swapPairIfNeeded a b = if(length a >= length b) 
              then [second, first]
              else [first, second]
          modifiedPair = swapPairIfNeeded first second
          wip = (take 1 modifiedPair) ++ charlieSort ( (drop 1 modifiedPair) ++ theRest)
+2  A: 

You don't need the areListsEqual function. You can compare lists with the (==) function. And I'd use a quicksort rather than bubblesort. Here's a solution that I think uses only what you should have learned so far.

charlieSort :: (Eq a) => [[a]] -> [[a]]
charlieSort []     = []
charlieSort (x:xs) = charlieSort (filter (cmpLen (>) x) xs) ++ [x] ++
                     charlieSort (filter (cmpLen (<=) x) xs)
   where filter _ [] = []
         filter p (x:xs) = (if (p x) then (x:) else id) (filter p xs)
         cmpLen f x y = f (length x) (length y)
Apocalisp
Thanks. Got this error, and I don't yet grok your solution enough to figure out how to fix: Couldn't match expected type `t1 -> t2 -> t' against inferred type `[a]' In the expression: (x :) In the expression: (if p x then (x :) else id) filter p xs In the definition of `filter': filter p (x : xs) = (if p x then (x :) else id) filter p xs
Charlie Flowers
Also, what is "id" here?
Charlie Flowers
Fixed that, sorry. It was a quick job. The id function is part of the prelude. It's defined as follows:id x = x
Apocalisp
Very interesting. I see that it works. I understand maybe about half of it. Couple of questions: 1) Looks like cmpLen expects three arguments, but you only ever call it with two. Am I reading that right, and if so why are you doing that? HEY! Nevermind ... I just figured it out! This is currying. Very cool. They haven't yet covered it, but I've seen it before. You're comparing the x from charlieSort to the x from filter, via currying, right? 2) I can't understand what id might return in this context, and how it affects the next call to filter.
Charlie Flowers
Yes, you can also write (cmpLen (>) x) as \y -> compLen (>) x y. Or even (\x y -> length x > length y). The `if` clause returns a function. Either a function that prepends x to a list, or the id function. You can also write this: if (p x) then x : filter p xs else filter p xs.
Apocalisp
So, in the case where the if clause returns the id function, is the "result" of the id function used at all? Or is it merely there as a "placeholder" because the if function has to evaluate to something? Does the value that "id" evaluates to have any bearing on the final result?
Charlie Flowers
The result of the id function is always just its argument. It's the function (\x -> x). It "does nothing", in the sense that it just passes values through. I don't know how to make that clearer. Play with it in ghci.
Apocalisp
+1  A: 

I'm on chapter 8, so I'm no old hand, but I'd prefer

areListsEqual x:xs y:ys = (x == y) && (areListsEqual xs ys)
areListsEqual [] [] = True
areListsEqual _ _ = False

It seems a bit more in line with Haskell style.

Similarly,

charlieSort [] = []
charlieSort (x:[]) = [x]
charlieSort (x1:x2:xs) = blah blah

swapPairIfNeed works as is because you only call it with first and second as its arguments (in that order), but you probably meant

swapPairIfNeed a b = if (length a >= length b)
    then [b, a]
    else [a, b]

In fact, I prefer the third case of charlieSort to look like

charlieSort (x1:x2:xs) = if not (areListsEqual x1:x2:xs wip)
                         then charlieSort wip
                         else wip
    where swapPairIfNeeded a b = if (length a >= length b)
                                 then (b, a)
                                 else (a, b)
          wip = f (swapPairIfNeeded first second)
          f (a, b) = a : (charlieSort b:xs)

I think this was all covered by chapter 3.

Now, let's examine the algorithm. Even holding ourselves to bubble sort, there's no need to check the whole list after sorting. Instead, we can swap the first two elements if necessary, then sort the tail of the list. If the head is shorter than the head of the sorted tail, we're done.

charlieSort (x1:x2:xs) = if (length a <= length (head sortedTail))
                         then a : sortedTail
                         else charlieSort (a : sortedTail)
    where sortedTail = charlieSort (b:xs)
          (a, b) = if (length x1 >= length x2)
                   then (x2, x1)
                   else (x1, x2)
ruds
Thanks. I really like "charlieSort (x:[]) = [x]", much cleaner than what I was doing. Also, interesting that you are doing what seems like a "depth-first" approach: I think you are sorting the tail before you sort the head. This makes sense and was something I never thought of. Helps me, because now I realize with more clarity that the whole problem can be simplified to this: figure out how to sort the first 2 entries; then, use that over and over till the whole list is sorted. Cool.
Charlie Flowers
+4  A: 

I would first and foremost start using pattern-matching.

areListsEqual :: Eq a => [a] -> [a] -> Bool
areListsEqual [    ] [    ] = True
areListsEqual [    ] _      = False
areListsEqual _      [    ] = False
areListsEqual (x:xs) (y:ys) = x == y && areListsEqual xs ys

Note how much more readable this is, when head and tail is avoided.

charlieSort :: Eq a => [[a]] -> [[a]]
charlieSort    [                    ] = []
charlieSort    [x                   ] = [x]
charlieSort xs@(first:second:theRest)
  | areListsEqual xs wip              = wip
  | otherwise                         = charlieSort wip
  where
  swapPairIfNeeded a b
    | length a >= length b = [second,first]
    | otherwise            = [first,second]
  modifiedPair = swapPairIfNeeded first second
  wip = take 1 modifiedPair ++ charlieSort (drop 1 modifiedPair ++ theRest)

I changed the if-then-else to a guard for slightly improved readability (YMMV). Instead of checking that the list has at least two elements with a call to length we use pattern-matching, which also allows us to name first,second,theRest directly. The name @ pattern pattern both matches the input against pattern and names the whole input as name.

Now, I want to avoid using take and drop for extracting the two elements of modifiedPair, so the last two lines are changed into

  [shorter,longer] = swapPairIfNeeded first second
  wip = [shorter] ++ charlieSort ([longer] ++ theRest)

where you could write the last line as

  wip = shorter : charlieSort (longer : theRest)

if you preferred. But why should swapPairIfNeeded return the shorter and the longer of the first and second list in a list ? Why not use a pair like

  swapPairIfNeeded a b
    | length a >= length b = (second,first)
    | otherwise            = (first,second)
  (shorter,longer) = swapPairIfNeeded first second

? In most circumstances, it is better to use tuples for fixed number of values (possibly of differing types) and to use lists for variable number of values (necessarily of same type). But it seems strange that swapPairIfNeeded compares its arguments a and b, but then returns first and second anyway. In this case, instead of letting it return a and b in a pair, I will remove swapPairIfNeeded altogether instead :

  (shorter,longer)
    | length first >= length second = (second,first)
    | otherwise                     = (first,second)

"unfolding" the body of swapPairIfNeeded into the definition of (shorter,longer).

So now the code of charlieSort looks like

charlieSort :: Eq a => [[a]] -> [[a]]
charlieSort    [                    ] = []
charlieSort    [x                   ] = [x]
charlieSort xs@(first:second:theRest)
  | areListsEqual xs wip              = wip
  | otherwise                         = charlieSort wip
  where
  (shorter,longer)
    | length first >= length second = (second,first)
    | otherwise                     = (first,second)
  wip = shorter : charlieSort (longer : theRest)

Finally, I should remark that charlieSort doesn't really implement bubble-sort, because the recursive call to charlieSort will not only make one "bubble-up" pass along the list, but also fully sort the list longer : theRest, so that all that has to be done after this recursive call (before returning one "level up") is to possibly percolate shorter to its rightful place.

Excellent points, thanks. Yes, I have some "bad coupling" in swapPairIfNeeded: there's no reason for it to "know about" both a and b, and first and second. Also, it doesn't need to return a list. Both of those were "left over" from an earlier attempt at the construction of "wip", in which I thought I could take the "modified pair" and stick it on the front of wip. Can you help me see why the "charlieSort [x] = [x]" pattern won't match every single list regardless of how many elements it contains?
Charlie Flowers
+1  A: 

You claim that bubble sort is the easiest sorting algorithm, but that's not quite the case here. Bubble sort is great for arrays, where you index into them linearly. For Haskell's linked lists, insertion sort is actually much prettier to look at.

Let's start with the insert function:

winsert :: [a] -> [[a]] -> [[a]]
winsert x [] = [x]
winsert x (y:ys)
    | length x < length y = x : y : ys
    | otherwise = y : winsert x ys
  • If the list is empty, put x into it
  • If the list isn't empty:
    • If x < y, then x belongs at the front of the list
    • Otherwise, the head of the list is y, and the tail is made of up x being inserted somewhere into ys.

Next, we have the actual sorting function:

wsort :: [[a]] -> [[a]]
wsort [] = []
wsort [x] = [x]
wsort (x:xs) = winsert x (wsort xs)
  • If the list is empty, return it
  • If the list has only one item, it doesn't need to be sorted
  • If the list is longer than that, sort xs, then insert x into the now sorted xs

Interestingly, by modifying winsert to take a function as an argument (in place of length), wsort could be used to sort based on all kinds of criteria. Try making one that sort a list of lists based on the sum of each sublist.

Wesley
Excellent, I like it. I agree it is very clean. Same question for you I asked Norman: I thought "wsort [x]" would be an irrefutable pattern (or at least, it would match any list of any length from 0 to infinity). Clearly it's not, since you're both using it. So why not? Thanks.
Charlie Flowers
On the contrary, it matches only a list with a single element. "wsort (x:xs)" would match any list with at least one element. You may be confusing pattern matching with the type signature. The *type* "[x]" indicates a list whose elements are of an arbitrary type, but the *pattern* "[x]" indicates a list with a single element. The type only comes up after the "::".
ruds
[x] matches any list with only one element. However, x can be any type, and even a list itself. If x is such a sublist, that sublist can be any length. [x] does not match [] or [1, 2]. [x] does match [1] (x is 1), [[]] (x is []), [[1]] (x is [1]), and [[1, 2]] (x is [1, 2]).
Wesley
+1  A: 

Charlie: I'll confine myself to a single critique: never use head, tail, or length if you can use pattern matching:

areListsEqual [] [] = True
areListsEqual (x:xs) (y:ys) = x == y && areListsEqual xs ys
areListsEqual _ _   = False

I can't follow your sort algorithm (and it would be courteous of you to reformat your question so as to eliminate the horizontal scroll bar), but I'd rewrite the first three lines like this:

charlieSort []  = []
charlieSort [x] = x
charlieSort (x1:x2:xs) = if ...

(P.S. All uses of head and tail can be rewritten using pattern matching, and beginners should do so. Not all uses of length can be replaced by pattern matching, but common noob codes like length xs == 0 or length xs >= 2 can and should be rewritten.)

(P.P.S. Even experienced Haskell programmers seldom use 'head'. Less than two-tenths of one percent of source lines in the Glasgow Haskell compiler mention 'head', and eyeballing those mentions, roughly half are in string literals or comments. That's approximately one use of 'head' for every 1500 lines of code.)

Norman Ramsey
Excellent, thanks for the advice. I will definitely try to replace head / length with pattern matching. Question: your second charlieSort pattern is "charlieSort [x] = x". I thought that would be an irrefutable pattern. Is it not for some reason?
Charlie Flowers
@Charlie: an irrefutable pattern is one that always matches. The pattern `[x]` matches only single-element lists, so it is refutable by the empty list or by a list of two or more elements.
Norman Ramsey