tags:

views:

87

answers:

2

Hi All,

Is it possible to create an AST for any arbitrary programming language or IR using C or C++ alone (without the help of tools like YACC and LEX )?

If so, how to implement the lexical and syntactic analysis ?

If not, what are the tools that have to augmented to C or C++ to successfully create an AST ?

Hope I made my doubt clear. If My question looks vague or out of context, please indicate the required.

P.S : I Am actually trying to create the AST for the LLVM's .ll format of IR representation. I do know that .ll is derived from AST. But I am trying out static analysis practices. So I am looking at creating the AST.

+1  A: 

The most straight-forward methodology for creating the parser without a parser-generator is recursive descent. It is very well documented - the standard book in the field is The Dragon Book.

A scanner, that takes text as input and produces a string of tokens as output, can be written using standard string manipulation techniques.

fmark
@fmark : I will look into it and probably be back with question if any.
bsoundra
A: 

I doubt there's a one-to-one mapping between your arbitrary langauge and LLVM's ASTs. That means it is likely that you really want to do this in two stages:

  • Parse your 'arbitrary language' using the best parsing tools you can get to simplify the problem of parsing your language. Use that to build an AST for your language, following standard methods for parser generators producing ASTs. LEX/YACC are OK but there are plenty of good alternatives out there. Its pretty likely you'll need to build a symbol table.

  • Walk the AST of the your parsed langauge to build your LLVM AST. This won't be one-to-one, but the ability to look around the tree near a tree node in your AST to collect information need to generate the LLVM code will likely be extremely helpful.

This is a classic style for a simple compiler.

I suggest you read the Aho/Ullman Dragon book on syntax directed translation. A day's worth of education will save you months of wasted engineering time.

Ira Baxter
@Ira : I read that book some few years back. I think almost 3 years back ( the first edition). But I bought a second edition, have not gone through the book yet. I guess the SDT was not discussed in detail in first edition, isnt ?Second edition has so many other things too. I need to read the book again I guess
bsoundra
I answered the question before you clarified you want to manipulate the LLVM IL. In this case, you can pretty much design a one-for-one text syntax for each element of the LLVM IL (e.g, your notion of using XML). In that case, you can likely build a parser that folds the "walk the AST step" into the parsing actions themselves; this is called "on the fly" code generation and is what really simple SDTs do.
Ira Baxter