views:

90

answers:

2

I'm using Text.ParserCombinators.Parsec and Text.XHtml to parse an input like this:

This is the first paragraph example\n
with two lines\n
\n
And this is the second paragraph\n

And my output should be:

<p>This is the first paragraph example\n with two lines\n</p> <p>And this is the second paragraph\n</p>

I defined:


line= do{
        ;t<-manyTill (anyChar) newline
        ;return t
        }

paragraph = do{
        t<-many1 (line) 
        ;return ( p << t )
    }

But it returns:

<p>This is the first paragraph example\n with two lines\n\n And this is the second paragraph\n</p>

What is wrong? Any ideas?

Thanks!

+1  A: 

The manyTill combinator matches zero or more occurrences of its first argument, according to the documentation, so line will happily accept a blank line, which means that many1 line will consume everything up to the final newline in the file, rather than stopping at a double newline as it seems you intended.

camccann
+3  A: 

From documentation for manyTill, it runs the first argument zero or more times, so 2 newlines in a row is still valid and your line parser will not fail.

You're probably looking for something like many1Till (like many1 versus many) but it doesn't seem to exist in the Parsec library, so you may need to roll your own: (warning: I don't have ghc on this machine, so this is completely untested)

many1Till p end = do
    first <- p
    rest  <- p `manyTill` end
    return (first : rest)

or a terser way:

many1Till p end = liftM2 (:) p (p `manyTill` end)
hzap
The problem is if you use this with `anyChar` as `p` it still matches two newlines, because `first <- p` consumes the first newline.
sepp2k
As a point of personal preference, I'd write that instead as: `many1Till p end = (:) <$> p <*> manyTill p end`. The `do` notation rarely improves Parsec-based code, to my eye. (Oops, didn't see your edit--the `liftM2` version is equivalent to mine, of course)
camccann