views:

737

answers:

6

I'm wondering if Scalas/Haskells parser combinators are sufficient for parsing a programming language. More specifically the language MiniJava. I'm currently reading compiller construction and jflex and java cup is quite painful to work with so I'm wondering if I could/should use parser combinators instead. The MiniJava syntax is very small. MiniJavas BNF: http://www.cambridge.org/us/features/052182060X/grammar.html

A: 

I haven't dealt with the Scala or Haskell parser combinator libraries, but it looks like the grammar should be fine.

Jay Kominek
+1  A: 

Programming in Scala (p. 647) says:

It [Scala's parser combinator framework] is much easier to understand and to adapt than a parser generator, and the difference in speed would often not matter in practice, unless you want to parse very large inputs.

As I would not classify source code as very large input (ideally), it should be sufficient.

Fabian Steeg
+11  A: 

I've never used Scala, but the existence of a definitive BNF makes this easy.

Trivially translated into Haskell's Text.ParserCombinators.Parsec:

goal = do c <- mainClass
          cs <- many classDeclaration
          eof
          return $ c:cs
mainClass = do token "class"
               name <- identifier
               ...

etc. The PArrows translation is pretty trivial too. You'll probably find it easier to have a distinct lexing phase before the parser, but you can do without too.

ephemient
+5  A: 

At least Parsec has built-in lexer for Java-like languages:

lexer = makeTokenParser javaStyle

You have to define the reserved words yourself.

mattiast
+5  A: 

I'm using Scala's parser combinators to parse PL/SQL code, it works like a charm.

Germán
any source / links?
oluies
found it here http://defmain.blogspot.com/search/label/scala
oluies
Ah yes, sorry the blog is in Spanish. I've also added support for C/Java parsing to the app.
Germán
+2  A: 

Scala's parser is a backtracking parser, so it can deal with pretty much any BNF or EBNF. It also means, though, that there are edge cases where input can be painfully slow to be read.

If the grammar can be changed into an LL(1) grammar, you can use the ~! operator to keep backtracking to a minimum.

The grammar probably CAN be turned into LL(1), but, as written, it is not. See, for instance, that Expression and Statement have First/First conflicts (look this up at the end of the linked article).

Anyway, for an academic project, it is enough. For real life compiler stuff, you'll need faster parsers.

Daniel