views:

304

answers:

2

If I am making a grammar for a c-like language that has a sequence of statements, what is the most standard way for defining the grammar?

My thought is to do something like this:

<program> ::= <statement>
<statement> ::= <statement-head><statement-tail>
<statement-head> ::= <if-statement> | <var-declaration> | <assignment> | <whatever>
<statement-tail> ::= ; | ;<statement>

but that feels a little clunky to me. I've also considered making

<program> ::= <statement>*

or

<statement> ::= <statement-head> ; | <sequence>
<sequence>  ::= <statement> <statement>

type productions.

Is there a standard or accepted way to do this. I want my AST to be as clean as possible.

+2  A: 

You can find the BNF for C here and I think it was taken from K&R, which you could check out. You could also check out the SQL BNF here which may provide more information on formulating good sequences.

This will provide some convention information.

In terms of AST generation, it really doesn't matter how 'clunky' your definition is providing it parses the source correctly for all permutations. Then just add the actions to build your AST.

Just make sure you are constructing your grammer for the right parser generator such as an LL or LR parser as you may run into problems with reduction, which will mean some rules need rewriting in a new way. See this on eliminating left recursion.

You may also want to check out Bison/Yacc examples such as these or these. Also check out the Dragon Book and a book called "Modern Compiler Implementation in C"

Aiden Bell
+6  A: 

A very common way would be:

<block-statement> ::= '{' <statement-list> '}' ;
<statement-list> ::= /* empty */ | <statement-list> <statement> ;
<statement> ::= <whatever> ';' ;

And then you define actual statements instead of typing <whatever>. It seems much cleaner to include trailing semicolons as part of individual statements rather than putting them in the definition for the list non-terminal.

Arthur Reutenauer
I like this. The only issue I have is I'm not sure if most parser generators (I'm using TinyPG) support the /* empty */ production. I was under the impression it wasn't so kosher.
CaptnCraig
Nevermind. After looking at the C grammar Aiden posted it can be:<st-list> ::= <st-list><statement> | <statement>
CaptnCraig
I use that all the time in my Bison grammar :-) Actually, I took it from O'Reilly's book *lex and yacc*, where the authors systematically use /* empty */ to emphasize that the empty rule really is there for something. If your parser generator doesn't support this kind of comment, you can of course leave it out.
Arthur Reutenauer
Great answer +1
Aiden Bell