tags:

views:

631

answers:

8

Hi I'd like to make my own 'parser', e.g: computing (4+(3-4^2))*2 or parsing java,jsf,html code.

In fact I did something like this but I feel it's not good.

Is there anything good for me? I've tried to read more, but I'm bit confused, LL, LR, AST,BNF,javacc yacc etc :). I'm not sure which way to go, when I would like to compute 4+...

or if I'd like to parse java,jsf code and produce something from this(another java code)

Is there anything generaly good enough like ast? or something which I can use for both?

thank you for help.

+1  A: 

Parsers can be pretty intense to write. The standard tools are bison or yacc for the grammar, and flex for the syntax. These all output code in C or C++.

Peter
+1  A: 

ANTLR is probably the way to go for java. It is a little intense, the book is apparently very good (I have only struggled with the online docs).

If you can stretch to other languages, then lex/yacc (or flex/bison) is the standard for C although I wouldn't particularly recommend either of those combinations (steep learning curve, showing their age a little now).

Python has about a million parsers available (SimpleParse, Yapps) or there is TreeTop for Ruby - the developer even has a demo that does simple calculations as in your question - but note that this won't do everything that a LALR parser can accomplish.

Martin Carpenter
A: 

You might want to check out http://antlr.org/. It will output java code. If I recall, one of their samples is pretty much what you want.

Jake Pearson
+1  A: 

ANTLR, but make sure you read The Definitive ANTLR Reference, which will walk you through the creation of parsers. ANTLR does top-down, LL parsers, so the book doesn't address LALR and other types.

JavaCC, Yacc, are SableCC are more traditional lexer/parser generators, and you'll find that they're a little more primitive and have steeper learning curves. ANTLR is equally powerful, but you don't have to learn it all at once. Wikipedia offers a comprehensive comparison of parser generators.

BNF is a syntax for specifying the grammar; ANTLR uses its own, which I find more aesthetic but which others often don't.

Nikhil Chelliah
A: 

You might want to check out Building Parsers With Java by Steven John Metsker. The book seems to cover exactly what you are looking to do.

Mr. Will
A: 

Using tools which generate Lexers and Parsers is generally far easier than writing your own from scratch.

In addition to whats already been listed, you could use things like JLex with CUP to create a simple interpreter for things like arithmetic expressions very easily.

Pete
Or JFlex (http://jflex.de), which is still maintained.
Michael Myers
+1  A: 

If it is a learning exercise, try starting with a top-down parser -- they are simple to write and don't require including/learning any other tools. Best place to research the basics is probably wikipedia or code-project.

Jeff Kotula
+1  A: 

Before anything else, you have to understand that everything about parsing is based on grammars.

Grammars describe the language you want to implement in terms of how to decompose the text in basic units and how to stack those units in some meaning ful way. You may also want to look for the token, non-terminal, terminal concepts.

Differences between LL and LR can be of two kinds: implementation differences, and grammar writing differences. If you use a standard tool you only need to understand the second part.

I usually use LL (top-down) grammars. They are simpler to write and to implement even using custom code. LR grammars theoretically cover more kinds of languages but in a normal situation they are just a hindrance when you need some correct error detection.

Some random pointers:

  • javacc (java, LL),
  • antlr (java, LL),
  • yepp (smarteiffel, LL),
  • bison (C, LR, GNU version of the venerable yacc)
cadrian
ANTLR's not LR by any means. v3 is LL(*), v2 was LL(k), and v1 is too old to even think about.
Nikhil Chelliah
oops- sorry, my mistake
cadrian