views:

69

answers:

1

Suppose I have a text that I can easily parse. It consists of text and special identifiers. After parsing I get a list of tokens that correspond to text and special identifiers in the text.

The problem I am having is how do I transform it from this token list into some other form?

I can't understand how to approach this problem.

I tried to build a parse tree but I don't understand what to do next.

Please advise on this general problem of parsing.

Thanks, Boda Cydo.

+2  A: 

Once you have a token stream, you can transform it into a parse tree by using a parser generator and specifying the grammar of your language.

Depending on the programming language you'd like to use, you might want to look into the following parser generators:

C/C++ - Yacc

Java - ANTLR (also JavaCC, SableCC)

Python - PLY (Python Lex / Yacc)

OCaml - ocamlyacc

If you don't know about grammars, the documentation for the parser generator you choose should give you enough to get you going.

When your parser is done, it will take the token stream and construct a tree using an intermediate representation - types that you define to represent the various pieces of your language (like the text and special identifiers that you mentioned). You can then manipulate the tree as you like.

Edit: in response to your comment - I'm not quite sure what level of answer to give you, as I can't tell exactly what problem you are having. First, are you familiar with tree data structures? If so, would you know how to write a simple recursive algorithm to find the height of a tree, or run a depth-first search? Remember that a tree is just a way to organize information - it is entirely up to you what you do with that information.

A common design pattern for applying an algorithm to a heterogenous tree (ie a tree whose nodes are of different types) is the Visitor pattern. If you are already familiar with trees, you can look up an example of the Visitor pattern in your favorite language; however, if the concept is new to you I would strongly recommend starting off with the simpler algorithms for practice.

danben
Thanks for your great answer. This has moved me one step closer to achieving my goal. One thing that is still very mysterious to me is how do you "manipulate the tree?" Can you expand on this topic a little? As I understand, the parser generator produces a tree (right?), but how do I know how to manipulate the tree after that? This is where I just get stuck. My mind can't wrap around "manipulating the tree."
bodacydo