views:

239

answers:

2

I've been considering using Haskell's Parsec parsing library to parse a subset of Java as a recursive descent parser as an alternative to more traditional parser-generator solutions like Happy. Parsec seems very easy to use, and parse speed is definitely not a factor for me. I'm wondering, though, if it's possible to implement "backup" with Parsec, a technique which finds the correct production to use by trying each one in turn. For a simple example, consider the very start of the JLS Java grammar:

Literal:
    IntegerLiteral  
    FloatingPointLiteral

I'd like a way to not have to figure out how I should order these two rules to get the parse to succeed. As it stands, a naive implementation like this:

literal = do {
    x <- try (do { v <- integer; return (IntLiteral v)}) <|>
         (do { v <- float; return (FPLiteral v)});
    return(Literal x)
}

Will not work... inputs like "15.2" will cause the integer parser to succeed first, and then the whole thing will choke on the "." symbol. In this case, of course, it's obvious that you can solve the problem by re-ordering the two productions. In the general case, though, finding things like this is going to be a nightmare, and it's very likely that I'll miss some cases. Ideally, I'd like a way to have Parsec figure out stuff like this for me. Is this possible, or am I simply trying to do too much with the library? The Parsec documentation claims that it can "parse context-sensitive, infinite look-ahead grammars", so it seems like something like I should be able to do something here.

+1  A: 

Either use Parsec's notFollowedBy to ensure that integer consumed everything up to some token separator (this approach will scale to arbitrary scenario most of the time), or take a look at parser combinators that explore all possible parsing alternatives. First to come to mind is UU_Parsing library.

ADEpt
+1  A: 

One way you can do this is to use the try combinator, which allows a parser to consume input and fail without failing the whole parse.

Another is to use Text.ParserCombinators.ReadP, which implements a symmetric choice operator, in which it is proven that a +++ b = b +++ a, so it really doesn't matter which order. I'm rather partial to ReadP, since it is minimal but provides what you need to build up a really powerful parser.

luqui