views:

500

answers:

1

I have a general idea of what an AST is, but I want to know how to construct one.

If you're given a grammar and a parse tree, how do you construct the AST?

How do you do it if you're given a grammar and an expression?

+4  A: 

Well, first off, the grammar is used to construct a parse tree from an expression.

So if you already have a parse tree, you don't need the grammar.

Depending on how much work your parser does, the resulting tree that is formed from parsing an expression could already be an abstract syntax tree. Or it could be a simple parse tree which requires a second pass to construct the ast.

To construct the parse tree from a grammar and an expression, you would first have to convert your grammar into working code.

Typically, you would split the work into a tokenizer which splits the input stream representing the expression into a list of tokens, and a parser which takes the list of tokens and constructs a parse tree\ast from it.

So the expression "1 + 2*(3+4)" might be split into a list of tokens like this:

1 - int
+ - add_operator
2 - int
* - mul_operator
( - lparen
3 - int
+ - add_operator
4 - int
) - rparen

The first column is the actual text value. The second represents the token type.

These tokens are fed into the parser, which is built from your grammar and recognizes the tokens and builds the parse tree.

So, how does one write the lexical tokenizer and the actual parser? You could roll your own by hand.

Or, more commonly, use a parser generator like coco or antlr or lex\yacc. These tools take a description of your grammar and generate the code for a tokenzier and parser.

Code generators exist for most popular languages and some unpopular ones as well.

How you construct your parser depends heavily on what language you use.

How you would write a parser in Haskell is entirely different from how you would in, say, C.

Here's a tutorial that shows you how to build your own recursive descent parser. Its an old one but still good: Let's build a compiler

Coco is a parser generator for various languages. That also comes with documentation on how to get started: Coco

If Python's your thing, then pyparsing maybe for you.

HS