views:

361

answers:

5

Sorry I don't quite get FP yet, I want to split a sequence of lines into a sequence of sequences of lines, assuming an empty line as paragraph division, I could do it in python like this:

def get_paraghraps(lines):
    paragraphs = []
    paragraph = []
    for line in lines:
        if line == "": # I know it could also be  "if line:"
            paragraphs.append(paragraph)
            paragraph = []
        else:
            paragraph.append(line)
    return paragraphs

How would you go about doing it in Erlang or Haskell?

+3  A: 

Think recursively.

get_paragraphs []      paras para = paras ++ [para]
get_paragraphs ("":ls) paras para = get_paragraphs ls (paras ++ [para]) []
get_paragraphs (l:ls)  paras para = get_paragraphs ls paras (para ++ [l])
outis
+3  A: 

I'm also trying to learn Haskell. A solution for this question could be:

paragraphs :: [String] -> [[String]]
paragraphs [] = []
paragraphs lines = p : (paragraphs rest)
    where (p, rest) = span (/= "") (dropWhile (== "") lines)

where I'm using the functions from Data.List. The ones I'm using are already available from the Prelude, but you can find their documentation in the link.

The idea is to find the first paragraph using span (/= ""). This will return the paragraph, and the lines following. We then recurse on the smaller list of lines which I call rest.

Before splitting out the first paragraph, we drop any empty lines using dropWhile (== ""). This is important to eat the empty line(s) separating the paragraphs. My first attempt was this:

paragraphs :: [String] -> [[String]]
paragraphs [] = []
paragraphs lines = p : (paragraphs $ tail rest)
    where (p, rest) = span (/= "") lines

but this fails when we reach the final paragraph since rest is then the empty string:

*Main> paragraphs ["foo", "bar", "", "hehe", "", "bla", "bla"]
[["foo","bar"],["hehe"],["bla","bla"]*** Exception: Prelude.tail: empty list

Dropping empty lines solves this, and it also makes the code treat any number of empty lines as a paragraph separator, which is what I would expect as a user.

Martin Geisler
`null` is marginally more efficient and definitely more general than testing for (in)equality with empty lists and strings. `span (/= "")` => `break null`, `dropWhile (== "")` => `dropWhile null`, both improve readability too.
ephemient
Yes and no -- when searching for an empty string in a list of strings, I think it's more readable to search for "". At least for at beginner like me :-)
Martin Geisler
+3  A: 

I'm only a beginning Haskell programmer (and the little Haskell I learnt was 5 years ago), but for a start, I'd write the natural translation of your function, with the accumulator ("the current paragraph") being passed around (I've added types, just for clarity):

type Line = String
type Para = [Line]

-- Takes a list of lines, and returns a list of paragraphs
paragraphs :: [Line] -> [Para]
paragraphs ls = paragraphs2 ls []

-- Helper function: takes a list of lines, and the "current paragraph"
paragraphs2 :: [Line] -> Para -> [Para]
paragraphs2 [] para = [para]
paragraphs2 ("":ls) para = para : (paragraphs2 ls [])
paragraphs2 (l:ls)  para = paragraphs2 ls (para++[l])

This works:

*Main> paragraphs ["Line 1", "Line 2", "", "Line 3", "Line 4"]
[["Line 1","Line 2"],["Line 3","Line 4"]]


So that's a solution. But then, Haskell experience suggests that there are almost always library functions for doing things like this :) One related function is called groupBy, and it almost works:

paragraphs3 :: [Line] -> [Para]
paragraphs3 ls = groupBy (\x y -> y /= "") ls

*Main> paragraphs3 ["Line 1", "Line 2", "", "Line 3", "Line 4"]
[["Line 1","Line 2"],["","Line 3","Line 4"]]

Oops. What we really need is a "splitBy", and it's not in the libraries, but we can filter out the bad ones ourselves:

paragraphs4 :: [Line] -> [Para]
paragraphs4 ls = map (filter (/= "")) (groupBy (\x y -> y /= "") ls)

or, if you want to be cool, you can get rid of the argument and do it the pointless way:

paragraphs5 = map (filter (/= "")) . groupBy (\x y -> y /= "")

I'm sure there is an even shorter way. :-)

Edit: ephemient points out that (not . null) is cleaner than (/= ""). So we can write

paragraphs = map (filter $ not . null) . groupBy (const $ not . null)

The repeated (not . null) is a strong indication that we really should abstract this out into a function, and this is what the Data.List.Split module does, as pointed out in the answer below.

ShreevatsaR
groupBy shouldn't be used this way. The function supplied to groupBy is supposed to be an equality predicate, which is supposed to be symmetric, while your function is not. While the particular implementation of groupBy in GHC may happen to use the predicate by comparing the first element of the group to the remaining ones (and thus your function, which disregards the first arg and tests on the second, works), another implementation could equally pass the arguments in the opposite order, and your code wouldn't work.
newacct
Do you have a reference for that? The Haskell 98 library report ( http://www.cs.auckland.ac.nz/references/haskell/haskell-library-1.4-html/list.html ) *says* it's an equality predicate, but gives an explicit implementation of the groupBy function. Note that the type signature of groupBy doesn't have an "Eq" constraint, suggesting it's supposed to work with an arbitrary (transitive?) predicate function... It seems common to use it this way: http://www.haskell.org/haskellwiki/List_function_suggestions#Generalize_groupBy_and_friends
ShreevatsaR
Hmm... okay, I take it back. It does seem to be common to use it that way.
newacct
I'd prefer using `not . null` to using `(/= "")`, which would lead to the even further point-free `paragraphs = map (filter $ not . null) . groupBy (const $ not . null)`
ephemient
+2  A: 

You want to group the lines, so groupBy from Data.List seems like a good candidate. It uses a custom function to determine which lines are "equal" so one can supply something that makes lines in the same paragraph "equal". For example:

import Data.List( groupBy )

inpara :: String -> String -> Bool
inpara _ "" = False
inpara _ _  = True

paragraphs :: [String] -> [[String]]
paragraphs = groupBy inpara

This has some limitations, since inpara can only compare two adjacent lines and more complex logic doesn't fit into the framework given by groupBy. A more elemental solution if is more flexible. Using basic recursion one can write:

paragraphs [] = []
paragraphs as = para : paragraphs (dropWhile null reminder)
  where (para, reminder) = span (not . null) as
                           -- splits list at the first empty line

span splits a list at the point the supplied function becomes false (the first empty line), dropWhile removes leading elements for which the supplied function is true (any leading empty lines).

sth
+3  A: 

The cleanest solution would be to use something appropriate from the split package.

You'll need to install that first, but then Data.List.Split.splitWhen null should do the job perfectly.

Ganesh Sittampalam