tags:

views:

191

answers:

4

I have a minimal toy language similar to javascript. I generate an AST to try out some optimization techniques like escape analysis, type inference. I tried a few approaches like generalizing operator tokens instead of a class/function for each one, keeping type information on every node... But I still don't feel like I am going anywhere. It quickly becomes unwieldy to work on...

I studied lua5, neko, v8 but.. well... I am sure not one of the brightest one around.

Does anybody have experience designing AST and implementing transformations on a AST, which is easy to work on? I would appreciate tips and tricks that made life easier for you?

(Please don't tell me to go read dragon book. I have it already.)

A: 

I use ANLTR.org which I found very easy.

kenny
A: 

Okay, I won't recommend the Dragon book, since you already have it. Can I recommend the Appel books? There is even some source code demonstrating the concepts, among them using the Visitor pattern for translating Abstract Syntax Trees to Concrete Syntax Trees. Good luck, and enjoy!

Alan
+1  A: 

As Alan mentioned, the Appel books are great. I had Modern Compiler Implementation in ML for an undergraduate course on compilers.

I would personally avoid doing many transformations on an AST simply because of the number of different constructs you can have and the number of ways the same thing can be expressed. You will frequently have to write code that handles a large number of cases and sub-cases, and as you said, it gets unwieldly very quickly.

It is better to transform the AST to a more minimal representation such as basic blocks in a control flow graph. Each operation can then be written as a simple statement in a basic block. The set of possible operations should be kept small. Just make sure to keep enough information that you can still do all the transformations you want (in particular, don't throw away types). You can also use Single Static Assignment form, where each program variable gets assigned only once. This provides an invariant which simplifies a lot of transformations and analyses.

Jay Conrod
A: 

ASTs represent the structure of program. For complex languages, your AST will necessarily be complicated. So don't assume this should be "easy".

Many folks assume that having an AST, life will be easier. But parsing is just the foothills of the Himalayas. ASTs don't represent the common inferences, such as the meaning of an identifier, what statement executes next, where this data is consumed. Unless you have all these available, you're not going to be able to do much with a real language, let alone do it easily.

It is best to make those inference results cached or explicit: symbol tables, control flows, data flows, ...

One can add pattern matching langauges to enable the recognition of syntax structures, or even to write transformation rules:

optimize_increment(x:exp):statement
= " \x=\x+1; " -->  " \x++ " if no_side_effects(x);

Such rules need to draw on the cached inferences (eg., "side_effects").

Trying to build all this is really hard. One way to make this practical is to amortize the infrastructure cost across many lanuages and transformation applications. Our DMS Software Reengineering Toolkit has all this machinery to varying degree for a wide variety of langauges, including C, C++, Java, C# and COBOL.

Ira Baxter