views:

90

answers:

4

For example,

EBNF

A ::= B c;

B ::= T1 | T2 | ε

T1 ::= a

T2 ::= b

parseA()
{
switch(currentToken.kind){
case Token.a :
    parseT1();
case Token.b :
    parseT2();
break;

case <epsilon> :

break;
default:
    // report error
break;
}
}

How to write the parser parse the epsilon ( set of empty string ) in Java?

+3  A: 

epsilon is just a marker for "empty string allowed here", so you don't need to parse anything; the descent is finished. It seems unlikely that you would actually have a token for that though; you need to detect if there's no token available or if the next token can be used in another production.

For example, you might get the input c. You need to realize that this matches the production A ::= B c;, because B can be reduced to epsilon -- there is no epsilon token, you just need to realize that the B rule there is optional and in that case needs to be skipped over to reduce c to A

Michael Mrozek
Yeah, I know you point. So parse epsilon it will be "case Token.c:", Right?. And in that case do noting just dummy statement, Right? But my teacher tell to implement parse epsilon method but I know If it does't parse epsilon, it will go to default case and report error, that is not true. Right ?
Atom Skaa ska Hic
+2  A: 

As it stands, this is a bit on the simplistic side to make much sense as a parser. The whole thing can be written as a simple regular expression: "[ab]?c". If you really insist on writing it as a parser, the code could be something like:

Boolean parseA() { 
    // an A is a B followed by a `c`:
    return parseB() && (get_token() == Token.c);
}

Boolean parseB  {
    // Get a token. 
    //     If it's an `a` or a `b`, consume it. 
    //     Otherwise, throw it back (match null string by consuming nothing).
    // Either way, return true (successful match).
    //
    token current_token = get_token();
    if (token != Token.a && token != Token.b)
        push_back(current_token);
    return true;
}

Edit (in response to comment on another answer): No, when you're matching B, you should not look for a Token.c. As far as B cares, there are three possibilities: match 'a', match 'b', or match nothing at all. It's then up to the part that parses A to check that it has a B followed by a Token.c.

For example, if you were to change the grammar to something like:

A ::= B C

B ::= a | b | ε

C ::= c | d

Since 'B' still has the same definition, you should not have to change it just because some other definition changed. Likewise, you might add to the grammar to allow (for example) an arbitrary string of B's followed by a C.

Jerry Coffin
Ok, I get your point. But some how to implement with switch-case, it is possible ? because it's must have default case to tell the error.
Atom Skaa ska Hic
@Atom: No. You *never* detect an error while parsing `B` -- since it's allowed to match nothing, it simply matches `a`, `b`, or nothing. At that point, there is *no* input that it should see as an error. When you parse `A`, you look for `B` followed by `c`, and if you don't get it, **then** you signal an error (which I've signified above by returning `false`.
Jerry Coffin
+2  A: 

Whether it's Java or any language, the point is you have a stream of tokens. You don't "get" a token and decide what to do. Rather you "look at" the next token, and if you can use it, you "accept" it.

See the difference?

Do not "get" and then "decide".
Do "look and decide" and then "accept".
This is the heart of the concept of "lookahead".

So, ParseA calls ParseB.

Parse B looks at the next token in the stream.
If it is "a" it accepts it and returns.
If it is "b" it accepts it and returns.
Otherwise it simply returns to ParseA (without accepting anything).

Then ParseA looks at the next token.
If it is "c" it accepts it and returns.
Otherwise it fails.

Make sense?

To do this, you should add a sentinel token to the end of the stream, that is never accepted. At the end, that should be the only token left in the stream. If not, you've got a syntax error consisting of extra garbage at the end.

Mike Dunlavey
+1 Good explanation; this example uses `StreamTokenizer.TT_EOF` as the sentinel token: http://sites.google.com/site/drjohnbmatthews/enumerated-functions
trashgod
A: 

You 'parse' an epsilon when you see anything in the FOLLOW set of the current non-terminal. So since FOLLOW(B) = { 'c' }, you use case Token.c: in the switch in parseB for it

Chris Dodd
What is this concept name that use the follow set ? It's a recursive-descent parsing, or not ?
Atom Skaa ska Hic