tags:

views:

96

answers:

1

I've been trying to create a parser using simpleparse. I've defined the grammar like this:

<w> := [ \n]*
statement_list := statement,(w,statement)?
statement := "MOVE",w,word,w,"TO",w,(word,w)+
word := [A-Za-z],[A-Za-z0-9]*,([-]+,[A-Za-z0-9]+)*

Now if I try to parse a string

MOVE ABC-DEF TO ABC
MOVE DDD TO XXX

The second statement gets interpreted as parameters of the first one... This sucks and is obviously not what I want. I have been able to get this working using pyparsing like this:

word = Word(alphas,alphanums+'-')
statement = "MOVE"+word+"TO"+word
statement_list = OneOrMore(statement.setResultsName('statement',True))

Is there any way to get this working in simpleparse as well?

EDIT: clarification below

I am not trying to achieve a line-based grammar. What I would like to see being parsed is:

Simple case

MOVE AA TO BB

More comlex case

MOVE AA TO BB 
           CC DD 
           EE FF

Several of the above statments

MOVE AA TO BB 
           CC
MOVE CC TO EE
MOVE EE TO FF 
           GG 
           HH IIJJK
+1  A: 

The grammar is currently ambiguous. On paper you cannot parse if "MOVE A TO B MOVE C TO D" is two statements, or one statement with particular badly named destinations.

You have two answers. You may like neither.

  1. You explicitly make your WORD not match any reserved word. That is, you specifically disallow matching MOVE or TO. This is equivalent to saying "MOVE is not a valid parameter name". This makes "MOVE TL TO TM TN TO" an error.

  2. You modify your grammar so that you can tell where the statement ends. You could add commas "MOVE AA TO BB, CC MOVE TM TO TN, TO, TP". You could add semi-colons or blank lines at the end of statements. You could require that MOVE be the least indented, like Python.

Charles Merriam
thanks for the clarification, it helped me out a lot. the parser is now working wonderfully.
Bartosz Radaczyński