views:

314

answers:

1

I'm building a parser for an imaginary programming language called C-- (not the actual C-- language). I've gotten to the stage where I need to translate the language's grammar into something Pyparsing can accept. Unfortunatly when I come to parse my input string (which is correct and should not cause Pyparsing to error) it's not parsing correctly. I fear this is due to errors in my grammar, but as I'm starting Pyparsing for the first time, I can't seem to see where I'm going wrong.

I've uploaded the grammar that I'm translating from here for people to have a read through.

EDIT: Updated with the advice from Paul.

This is the grammer I've currently got (the two top lines of Syntax definition are terribly bad of me I know):

# Lexical structure definition
ifS = Keyword('if')
elseS = Keyword('else')
whileS = Keyword('while')
returnS = Keyword('return')
intVar = Keyword('int')
voidKeyword = Keyword('void')
sumdiff = Literal('+') | Literal('-')
prodquot = Literal('*') | Literal('/')
relation = Literal('<=') | Literal('<') | Literal('==') | \
           Literal('!=') | Literal('>') | Literal('=>')
lbrace = Literal('{')
rbrace = Literal('}')
lparn = Literal('(')
rparn = Literal(')')
semi = Literal(';')
comma = Literal(',')
number = Word(nums)
identifier = Word(alphas, alphanums)

# Syntax definition
term = ''
statement = ''
variable    =   intVar + identifier + semi
locals      =   ZeroOrMore(variable)
expr        =   term | OneOrMore(Group(sumdiff + term))
args        =   ZeroOrMore(OneOrMore(Group(expr + comma)) | expr)
funccall    =   Group(identifier + lparn + args + rparn)
factor      =   Group(lparn + expr + rparn) | identifier | funccall | number
term        =   factor | OneOrMore(prodquot + factor)
cond        =   Group(lparn + expr + relation + expr + rparn)
returnState =   Group(returnS + semi) | Combine(returnS + expr + semi)
assignment  =   Group(identifier + '=' + expr + semi)
proccall    =   Group(identifier + lparn + args + rparn + semi)
block       =   Group(lbrace + locals + statement + rbrace)
iteration   =   Group(whileS + cond + block)
selection   =   Group(ifS + cond + block) | Group(ifS + cond + block + elseS + block)
statement   =   OneOrMore(proccall | assignment | selection | iteration | returnState)
param       =   Group(intVar + identifier)
paramlist   =   OneOrMore(Combine(param + comma)) | param
params      =   paramlist | voidKeyword
procedure   =   Group(voidKeyword + identifier + lparn + params + rparn + block)
function    =   Group(intVar + identifier + lparn + params + rparn + block)
declaration =   variable | function | procedure
program     =   OneOrMore(declaration)

I'd like to know if there are any mistakes I've made in translating the grammar across and what improvements I could do to make it simplified whilst adhering to the grammar I've been given.

EDIT 2: Updated to include the new error.

Here is the input string I am parsing:

int larger ( int first , int second ) { 
if ( first > second ) { 
return first ; 
} else { 
return second ; 
} 
} 

void main ( void ) { 
int count ; 
int sum ; 
int max ; 
int x ; 

x = input ( ) ; 
max = x ; 
sum = 0 ; 
count = 0 ; 

while ( x != 0 ) { 
count = count + 1 ; 
sum = sum + x ; 
max = larger ( max , x ) ; 
x = input ( ) ; 
} 

output ( count ) ; 
output ( sum ) ; 
output ( max ) ; 
}

And this is the error message I get when running my program from Terminal:

/Users/Joe/Documents/Eclipse Projects/Parser/src/pyparsing.py:1156: SyntaxWarning: null string passed to Literal; use Empty() instead
other = Literal( other )
/Users/Joe/Documents/Eclipse Projects/Parser/src/pyparsing.py:1258: SyntaxWarning: null string passed to Literal; use Empty() instead
other = Literal( other )
Expected ")" (at char 30), (line:6, col:26)
None
+7  A: 

1) Change Literal("if") to Keyword("if") (and so on, down to Literal("void")), to prevent matching the leading "if" of a variable named "ifactor".

2) nums, alphas, and alphanums are not expressions, they are strings, that can be used with the Word class to define some typical sets of characters when defining "words" like "a number is a word made up of nums", or "an identifier is a word that starts with an alpha, followed by zero or more alphanums." So instead of:

number = nums
identifier = alphas + OneOrMore(alphanums)

you want

number = Word(nums)
identifier = Word(alphas, alphanums)

3) Instead of Combine, I think you want Group. Use Combine when you want the matched tokens to be contiguous with no intervening whitespace, and will concatenate the tokens and return them as a single string. Combine is often used in cases like this:

realnum = Combine(Word(nums) + "." + Word(nums))

Without Combine, parsing "3.14" would return the list of strings ['3', '.', '14'], so we add Combine so that the parsed result for realnum is '3.14' (which you could then pass to a parse action to convert to the actual floating value 3.14). Combines enforcement of no intervening whitespace also keeps us from accidentally parsing 'The answer is 3. 10 is too much.' and thinking the "3. 10" represents a real number.

4) This should not cause your error, but your input string has lots of extra spaces. If you get your grammar working, you should be able to parse "int x;" just as well as "int x ;".

Hope some of these hints get you going. Have you read any online pyparsing articles or tutorials? And please look through the online examples. You'll need to get a good grasp of how Word, Literal, Combine, etc. perform their individual parsing tasks.

5) You have mis-implemented the recursive definitions for term and statement. Instead of assigning '' to them, write:

term = Forward()
statement = Forward()

Then when you go to actually define them with their recursive definitions, use the << operator (and be sure to enclose the RHS in ()'s).

term << (... term definition ...)
statement << (... statement definition ...)

You can find an example of a recursive parser here, and a presentation on basic pyparsing usage here - see the section titled "Parsing Lists" for more step-by-step on how the recursion is handled.

Paul McGuire
Thanks Paul. I've replaced all instances of Combine with Group and now I'm getting some different errors which I think are related to how my rules are written rather than the syntax I've used. I've updated my original post to reflect those changes you suggested and the new error I have.
greenie
Excellent Paul, thanks very much. I wish I could +1 you more :)
greenie