ansaurus

Question

Parsing expressions with an undefined number of arguments

Answer 1

+2 A:

I couldn't fully understand your question, but it seems what you want is a grammar definition and a parser generator. I suggest you take a look at ANTLR, it should be pretty straightforward with it to define a grammar for either your original syntax or the RPN.

Edit: (After exercising self-criticism, and making some effort to understand the question details.) Actually, the language grammar is unclear from your example. However, it seems to me, that the advantages of the prefix/postfix notations (i.e. that you need neither parentheses nor a precedence-aware parser) stem from the fact that you know the number of arguments every time you encounter an operator, therefore you know exactly how many elements to read (for prefix notation) or to pop from the stack (for postfix notation). OTOH, I beleive that having operators which can have variable number of arguments makes prefix/postfix notations not simply difficult to parse but outright ambiguous. Take the following expression for example:

# a * b c d

Which of the following three is the canonical form?

#(a, *(b, c, d))
#(a, *(b, c), d)
#(a, *(b), c, d)

Without knowing more about the operators, it is impossible to tell. Of course you could define some sort of greedyness of the operators, e.g. * is greedier than #, so it gobbles up all the arguments. But this would beat the purpose of a prefix notation, because you simply wouldn't be able to write down the second variant from the above three; not without additinonal syntactic elements.

Now that I think of it, it is probably not by sheer chance that none of the programming languages I know support operators with a variable number of arguments, only functions/procedures.

David Hanak 2009-03-18 11:12:25

Dear David, thanks for your time and for ANTLR link. What I'm actually doing is not a programming language and probably I've misled you by using term "operator". Real purpose of the language is a human-friendly serialization of a tree. Canonical form is 1 but I may introduce "end" lexems like /*

2009-03-18 14:46:48

ctd.: So # a * b c /* d will result in #(a *(b, c), d).I am also happy to report that approach using articial marker lexems seems to be working so far.

2009-03-18 14:48:28

#a * b c /* d would than be rpn'ed into:M a M c b * d #

2009-03-18 14:49:25

And also the whole thing starts to resemble TeX

2009-03-18 14:51:48

ansaurus

tags:

views:

answers:

Parsing expressions with an undefined number of arguments

related questions