views:

1037

answers:

6

I recently wrote a parser in Python using Ply (it's a python reimplementation of yacc). When I was almost done with the parser I discovered that the grammar I need to parse requires me to do some look up during parsing to inform the lexer. Without doing a look up to inform the lexer I cannot correctly parse the strings in the language.

Given than I can control the state of the lexer from the grammar rules I think I'll be solving my use case using a look up table in the parser module, but it may become too difficult to maintain/test. So I want to know about some of the other options.

In Haskell I would use Parsec, a library of parsing functions (known as combinators). Is there a Python implementation of Parsec? Or perhaps some other production quality library full of parsing functionality so I can build a context sensitive parser in Python?

EDIT: All my attempts at context free parsing have failed. For this reason, I don't expect ANTLR to be useful here.

+2  A: 

An option you may consider, if an LL parser is ok to you, is to give ANTLR a try, it can generate python too (actually it is LL(*) as they name it, * stands for the quantity of lookahead it can cope with).

PW
In my case I need more than traditional parsing allows it seems. All my attempts at writing a traditional context free parser have fallen on their face for theoretical reasons. I'm fairly confident at this point that I need conditional lexing at a minimum. Would ANTLR still apply?
Jason Dagit
You have * lookahead with ANTLR and if you need you can add syntactics and predicates to your grammar (semantics predicates exists also).use antlrwork, it's really helpful for designing/debugging grammar (http://www.antlr.org/works/index.html). There are ready made grammars on the ANTLR site too.
PW
+1  A: 

There's ANTLR, which is LL(*), there's PyParsing, which is more object friendly and is sort of like a DSL, and then there's Parsing which is like OCaml's Menhir.

nt
I'm investigating PyParsing now, it looks the closest to Parsec that I've seen. I'm accepting your answer for now. Let's hope PyParsing works :) Thanks!
Jason Dagit
I'm concerned about reading this in the PyParsing documentation: Performance of pyparsing may be slow for complex grammars and/or large input strings.
Jason Dagit
A: 

ANTLR is great and has the added benefit of working across multiple languages.

Joe Skora
+4  A: 

PySec is another monadic parser, I don't know much about it, but it's worth looking at.

http://www.valuedlessons.com/2008/02/pysec-monadic-combinatoric-parsing-in.html

rcreswick
+3  A: 

I believe that pyparsing is based on the same principles as parsec.

Peter Hart
+1  A: 

Nothing prevents you for diverting your parser from the "context free" path using PLY. You can pass information to the lexer during parsing, and in this way achieve full flexibility. I'm pretty sure that you can parse anything you want with PLY this way.

For a hands-on example, consider http://code.google.com/p/pycparser/ - it is a parser for ANSI C written in Python with PLY. It solves the classic C typedef - identifier problem (that makes C's grammar non context-sensitive) by populating a symbol table in the parser that is being used in the lexer to resolve symbol names as either types or not.

Eli Bendersky