views:

156

answers:

2

I'm trying to get this to work:

def emptyCond: Parser[Cond] = ("if" ~ "(") ~> regularStr <~ ")" ^^ { case s => Cond("",Nil,Nil) }

where regularStr is defined to accept a number of things, including ")". Of course, I want this to be an acceptable input: if(foo()). But for any if(x) it is taking the ")" as part of the regularStr and so this parser never succeeds.

What am I missing?

Edit:

regularStr is not a regular expression. It is defined thus:

  def regularStr = rep(ident | numericLit | decimalLit | stringLit | stmtSymbol) ^^ { case s => s.mkString(" ") }

and the symbols are:

  val stmtSymbol = "*" | "&" | "." | "::" | "(" | ")" | "*" | ">=" | "<=" | "=" | 
               "<" | ">" | "|" | "-" | "," | "^" | "[" | "]" | "?" | ":" | "+" |
               "-=" | "+=" | "*=" | "/=" | "&&" | "||" | "&=" | "|="

I don't need exhaustive language check, just the control structures. So I don't really care what's inside "()" in if(), I want to accept any sequence of identifiers, symbols, etc. So, for my purposes even if())) should be valid, where "))" is the if's "condition".

+4  A: 

A regular expression cannot recognize a language that has nested, balanced constructs such as (...), [...], {...}, etc. So you're going to need to use further context-free productions (not regular expressions) to match the regularStr portions.

Randall Schulz
It may have sound as if regularStr is a regular expression but it returns a Parser[String]. I really don't want to expect balanced constructs, I just want to catch control structures without checking exhaustively. I would take "if()))" as valid, meaning an if() with the condition "))". Is it possible?
Germán
The example you gave as something you want to accept, `if( )) )` is going to be a problem. How do you know where to stop consuming tokens if there's no rules about what may appear within? If the entire input were `if ( ... )` (where `...` is anything), then you're OK, but if you need to go on a parse other constructs, the ambiguity is considerable. My recommendation is not to try to make cheap / loose parser using a real parsing tool.
Randall Schulz
+1 for making me think. Yes, if())) is a problem, but is not a requirement, so if it makes things more difficult then I really don't have to accept it.
Germán
A: 

OK, accepting if())) was not really a requirement, just an example of what I would be willing to accept in order to make my parsing as cheap as possible, to just worry about capturing control structures.

However it appears I can't be so cheap and still have it work. So, since the if() construct has parenthesis, all I have to do is expect what's inside to have well balanced parenthesis. A closing ")" where one isn't expected cannot be part of the condition.

I did this:

  val regularNoParens = ident | numericLit | decimalLit | stringLit | stmtSymbol 
  def regularParens: Parser[String] = "(" ~ rep(regularNoParens | regularParens) ~ ")" ^^ { case l ~ s ~ r => l + s.mkString(" ") + r } 
  def regularStr = rep(regularNoParens | regularParens) ^^ { case s => s.mkString(" ") }

And I took out "(" and ")" from stmtSymbol. Works!

Edit: it didn't support nesting, fixed it.

Germán