views:

325

answers:

2

I've been writing a little monadic parser-combinator library in F# (somewhat similar to FParsec) and now tried to implement a small parser for a programming language.

I first implemented the code in Haskell (with Parsec) which ran perfectly well. The parsers for infix expressions are designed mutually recursive.

parseInfixOp :: Parser String -> Parser Expression -> Parser Expression
parseInfixOp operatorParser subParser = ignoreSpaces $ do
                                          x <- ignoreSpaces $ subParser
                                          do
                                            op <- ignoreSpaces $ operatorParser
                                            y  <- parseInfixOp operatorParser subParser
                                            return $ BinaryOp op x y
                                           <|> return x

parseInfix :: [String] -> Parser Expression -> Parser Expression
parseInfix list = parseInfixOp (foldl1 (<|>) $ map string list)

parseExpr :: Parser Expression
parseExpr = parseInfix0

parseInfix0 = parseInfix ["==", "<>", "And", "Or", "Xor", "<", ">", "<=", ">="] parseInfix1
parseInfix1 = parseInfix ["+", "-", "Mod"] parseInfix2
parseInfix2 = parseInfix ["*", "/", "\\"] parseInfix3
parseInfix3 = parseInfix ["^"] parseInfix4
parseInfix4 = parseFactor

parseFactor :: Parser Expression
parseFactor = parseFactor' <|> (betweenChars '(' ')' parseExpr)

parseFactor' :: Parser Expression
parseFactor' = parseString
           <|> parseBool
           <|> parseNumber
           <|> parseVariable
           <|> (try parseFunCall) <|> parseIdentifier

Since the order of functions doesn't matter and Haskell is evaluating in a non-strict way, this is OK, but F# is evaluating strictly.

let rec parseExpr = parseInfix0
and parseFactor = (parseFactor') <|> (betweenChars '(' ')' parseExpr)
and parseInfix2 = parseInfix ["^"] parseFactor BinaryOp
and parseInfix1 = parseInfix ["*"; "/"] parseInfix2 BinaryOp
and parseInfix0 = parseInfix ["+"; "-"] parseInfix1 BinaryOp
and parseFunCall = parser {
        let! first = letter
        let! rest = many (letter <|> digit)
        let  funcName = toStr $ first::rest

        do! ignoreSpace
        let! args = betweenChars '(' ')' $ sepBy (parseExpr) ","

        return FunCall(funcName, args)
    }
and parseFactor' =
    parseNumber
<|> parseString
<|> parseBool
<|> parseFunCall
<|> parseIdentifier

F# now either complains about recursive objects and just throws a StackOverflowException at runtime due to an infinite loop or it even doesn't compile the source because "a value would be part of its own definion".

What's the best way to prevent this errors. The debugger advices me to make use functions or lazys instead but what should I make lazy here?

+6  A: 

What is the warning about recursive objects? Show the text; there's one such warning that is ignorable (indeed, in a sense desirable) for this case.

If it doesn't compile because of recursive values, you simply need to turn the 'syntactic values' into 'syntactic functions'. That is, rather than

...
and parseInfix2 = body
...

use

...
and parseInfix2 x = (body) x
...

even though the type of 'parseInfix2' is the same function type either way... F# (unlike Haskell) will sometimes require you to be explicit (do eta-conversion as above).

I'd ignore suggestions about inserting 'lazy', parsers are indeed functions, not values, so the eta-conversion will cover the same issue (none of this will be evaluated eagerly at all, it all needs to 'wait' until you pass the string to be parsed before anything starts to 'run').

Regarding StackOverflowExceptions, if you post a looping snippet of the stack it may help, but you maybe can see for yourself e.g. if you have a left-recursive portion of the grammar that doesn't consume any input and gets caught in a loop. I think that's an easy pitfall for most parsing technologies in most languages.

Brian
Perfect - explicit η-conversion was my solution! Thanks
Dario
+1  A: 

η-conversion is not necessarily a great solution - if you do this, you'll have to prove that the delayed function is run at most once, or pay a lot of overhead for calling it during parsing.

You want something like this:

let rec p1 = lazy (...)
and p2 = ... p1.Value ..

A better way, if you have a workflow builder, is to define the Delay member to do this for you:

type Builder() =
    member this.Delay(f: unit -> Parser<_>) : Parser<_> = 
        let promise = Lazy.Create f
        Return () |> Bind (fun () -> promise.Value)

let parse = new Builder()


let rec p1 =
    parse {
        ...
    }

and p2 =
    parse {
        ...
    }
toyvo
Thank you. I recently came up with a similar approach.
Dario