views:

106

answers:

2

Hi, I'm reading this introductory book on parsing (which is pretty good btw) and one of the exercice is to "build a parser for your favorite language." Since I don't want to die today, I thought I could do a parser for something relatively simple, ie a simplified CSS.

Note: This book teach you how to write a LL(1) parser using the recursive-descent algorithm.

So, as a sub-exercice, I am building the grammar from what I know of CSS. But I'm stuck on a production that I can't transform in LL(1) :

//EBNF
block = "{", declaration, {";", declaration}, [";"], "}"

//BNF
<block> =:: "{" <declaration> "}"
<declaration> =:: <single-declaration> <opt-end> | <single-declaration> ";" <declaration>
<opt-end> =:: "" | ";"

This describe a CSS block. Valid block can have the form :

{ property : value }
{ property : value; }
{ property : value; property : value }
{ property : value; property : value; }
...

The problem is with the optional ";" at the end, because it overlap with the starting character of {";", declaration}, so when my parser meet a semicolon in this context, it doesn't know what to do.

The book talk about this problem, but in its example, the semicolon is obligatory, so the rule can be modified like this :

block = "{", declaration, ";", {declaration, ";"}, "}"

So, Is it possible to achieve what I'm trying to do using a LL(1) parser?

+1  A: 
outis
Ok. But How do you deal (code-wise, in a recursive-descent parser) with production like end-declaration = "" | declaration ?
Subb
outis
Thanks. I need a little more study to understand those First and Follow sets!
Subb
Understanding them isn't too hard. `FIRST(T)` are the leading terminals in anything that `T` can produce (e.g. if T is a C identifier, `FIRST(T)` is the set of letters and '_'). `FOLLOWS(T)` is the set of terminals that can follow whatever `T` produces. Generating them from production rules is a little trickier. Check your book for the topic of "pushdown automata", or see: http://research.microsoft.com/en-us/um/people/abegel/cs164/ll1.html, http://en.wikipedia.org/wiki/LL_parser#Constructing_an_LL.281.29_parsing_table, http://www.cs.uky.edu/~lewis/essays/compilers/ll-lang.html.
outis
I will read those! Thanks!
Subb
+1  A: 

I think I got it:

//EBNF 
block = "{", decl, "}"
decl = simple-decl, [";", [decl]]
simple-decl = ...

//BNF
<block> =:: "{" <decl> "}"
<decl> =:: <simple-decl> <decl-end>
<decl-end> =:: ";" <decl-tail> | ""
<decl-tail> =:: <decl> | e

This produce the following code :

private function parseBlock():void {
    accept(Token.LBRACE);
    parseDecl();
    accept(Token.RBRACE);
}


//Token.IDENTIFIER is the starting token of a declaration
private function parseDecl():void {
    accept(Token.IDENTIFIER);
    if(_currentToken.kind == Token.SEMICOLON){
        accept(Token.SEMICOLON);
        if(_currentToken.kind == Token.IDENTIFIER){
            parseDecl();
        }
    }
}

I'm I right?

Subb
Looks good. That should do it.
outis