views:

217

answers:

3

Given a LL(1) grammar what is an appropriate data structure or algorithm for producing an immutable concrete syntax tree in a functionally pure manner? Please feel free to write example code in whatever language you prefer.

My Idea

symbol : either a token or a node

result : success or failure

token : a lexical token from source text
    value -> string : the value of the token
    type -> integer : the named type code of the token
    next -> token : reads the next token and keeps position of the previous token
    back -> token : moves back to the previous position and re-reads the token

node : a node in the syntax tree 
    type -> integer : the named type code of the node
    symbols -> linkedList : the symbols found at this node
    append -> symbol -> node : appends the new symbol  to a new copy of the node

Here is an idea I have been thinking about. The main issue here is handling syntax errors. I mean I could stop at the first error but that doesn't seem right.

let program token =
    sourceElements (node nodeType.program) token        

let sourceElements node token =
    let (n, r) = sourceElement (node.append (node nodeType.sourceElements)) token
    match r with
    | success -> (n, r) 
    | failure -> // ???     

let sourceElement node token =
    match token.value with
    | "function" -> 
        functionDeclaration (node.append (node nodeType.sourceElement)) token
    | _ -> 
        statement (node.append (node nodeType.sourceElement)) token 

Please Note

I will be offering up a nice bounty to the best answer so don't feel rushed. Answers that simply post a link will have less weight over answers that show code or contain detailed explanations.

Final Note

I am really new to this kind of stuff so don't be afraid to call me a dimwit.

A: 

Eric Lippert's blog series on immutable binary trees may be helpful. Obviously, you need a tree which is not binary, but it will give you the general idea.

Craig Stuntz
+9  A: 

You want to parse something into an abstract syntax tree.

In the purely functional programming language Haskell, you can use parser combinators to express your grammar. Here an example that parses a tiny expression language:

EDIT Use monadic style to match Graham Hutton's book

    -- import a library of *parser combinators*
import Parsimony
import Parsimony.Char
import Parsimony.Error
(+++) = (<|>)

    -- abstract syntax tree
data Expr = I Int
          | Add Expr Expr
          | Mul Expr Expr
          deriving (Eq,Show)

    -- parse an expression
parseExpr :: String -> Either ParseError Expr
parseExpr = Parsimony.parse pExpr
    where

        -- grammar
    pExpr :: Parser String Expr
    pExpr = do
        a <- pNumber +++ parentheses pExpr  -- first argument
        do
            f <- pOp                        -- operation symbol
            b <- pExpr                      -- second argument
            return (f a b)
         +++ return a                       -- or just the first argument

    parentheses parser = do                 -- parse inside parentheses
        string "("
        x <- parser
        string ")"
        return x

    pNumber = do                            -- parse an integer
        digits <- many1 digit
        return . I . read $ digits

    pOp =                                   -- parse an operation symbol
         do string "+"
            return Add
        +++ 
         do string "*"
            return Mul

Here an example run:

*Main> parseExpr "(5*3)+1"
Right (Add (Mul (I 5) (I 3)) (I 1))

To learn more about parser combinators, see for example chapter 8 of Graham Hutton's book "Programming in Haskell" or chapter 16 of "Real World Haskell".

Many parser combinator library can be used with different token types, as you intend to do. Token streams are usually represented as lists of tokens [Token].

Heinrich Apfelmus
I have to admit, this is rough for me to understand. In my mind I am imagining a state machine passing along continuations but it will definitely take some time for me to understand this.
ChaosPandion
The key about here is that you don't need to implement a state machine or something anymore, it's already done for you in the library of parser combinators an its `Parser tokens a` type. You can just specify the grammar abstractly and let the library figure out what to do with it.
Heinrich Apfelmus
@Heinrich - There are two ways to understand a topic (grammars/parser/compiler in this case). The first way is to simply be smart enough to understand it with little effort (This is how I worked before I decided to learn advanced computer science.). Once I found out I am not that smart, I decided that the only way I will be able to learn this is through the second method. The second method is simple dedication. Since then I have written 4 ECMAScript parsers, each getting progressively better as I learned new techniques. However they have all been imperative.
ChaosPandion
@Heinrich - If I can complete a functionally pure parser from start to finish I will look at that as the capstone of my learning experience. After that I will begin a detailed study of the tools and libraries available to me. If I jump straight into libraries I feel like I may be cheating myself.
ChaosPandion
Your outstanding dedication to learning is best complemented by a good book. :-) In your case, I strongly recommend getting Graham Hutton's book, perhaps from a library or electronically. It introduces general tree types in Haskell and also features a very clear explanation of parser combinators and their implementation. There are video lectures based on the book, too, check out its website. I'll change my code (I'm using an additional "complication") to match the style in the book.
Heinrich Apfelmus
Man I was hoping for a few more answers but it looks like you are the winner. *(You did convince me to learn Haskell so that makes sense.)*Have you looked at the Leksah IDE? It is only version 0.8 but I really like it.
ChaosPandion
Thanks. :-) By the way, the [email protected] and [email protected] mailing lists as well as the #haskell IRC channel are great resources to learn Haskell, don't hesitate to ask questions there, too. Concerning Lekash, I haven't used it due to compilation problems with Gtk+ on MacOS X. Most people, including me, use a standard text editor for Haskell; works like a charm for me.
Heinrich Apfelmus
+2  A: 

Definitely check out the monadic parser combinator approach; I've blogged about it in C# and in F#.

Brian