views:

136

answers:

2

Hi guys, I am helping my nephew for his C lab homework, it is a string manipulation assignment and applying Wang's algorithm.

Here is the BNF representation for the input.

<s> ::= <l> # <r>

<l> ::= <list>| ε
<r> ::= <list>| ε
<list> ::= <f>|<f> , <list>
<f> ::= <letter>| - <f>| (<f><op><f>)
<op> ::= & | | | >
<letter> ::= A |... | Z

What is the best practice to handle and parse this kind of input in C? How can I parse this structure without using struct? Thanks in advance.

A: 

As you already have your BNF, the simplest way to parse this kind of input would be to use a parser generator. But due to this being homework, I'm not sure wether using a generator is allowed.

Nevertheless, you can also use a hand-written parser. Just do a search for recursive descent parsers...

milan1612
+3  A: 

The most straightforward approach is to make every rule (or "production") a function. This is called a "recursive descent" parser.

Write two routine that will peek at and get the next character as well.

This will give you some code that looks something like this (in pseudocode):

// <sequent> ::= <lhs> # <rhs>
sequent()
    lhs()
    if peekchar() != '#' then error
    else poundsign = nextchar()
    rhs()


// <lhs> ::= <formulalist>| ε
lhs()
    if peekchar() == EOF
        return
    else
       formula()

// <rhs> ::= <formulalist>| ε
rhs()
    if peekchar() == EOF
        return
    else
       formulalist()

// <formulalist> ::= <formula>|<formula> , <formulalist>
formulalist()
   formula()
   if peekchar() is ','
       comma = nextchar()
       return formulalist()

// <formula> ::= <letter>| - <formula>| (<formula><infixop><formula>)
formula()
    next = peekchar()
    if next in A..Z
        letter
    else if next is -
        minus_sign = nextchar()
        return formula()
    else
        formula()
        infixop()
        formula()

// <infixop> ::= & | | | >
infixop()
    c = nextchar()
    if c not in &,|,> then error

// <letter> ::= A | B | ... | Z
letter()
    c = nextchar()
    if c not A..Z then error

and so forth, for each rule.

The general idea:

  • each rule is a function
  • at certain points the function peeks ahead to see what to do. for example, formula() checks to see if the first character is a minus sign.
Mark Harrison
ε means "empty" not end of file. Your function for LHS needs to be modified accordingly. Also, lhs calls fiormula rather than formulalist which , I guess, is a typo.
JeremyP
Thanks for the reply, with some advancements, algorithm works.
baris_a