views:

279

answers:

3

I'm trying to contruct a parser in scala which can parse simple SQL-like strings. I've got the basics working and can parse something like:

select id from users where name = "peter" and age = 30 order by lastname

But now I wondered how to parse nested and classes, i.e.

select name from users where name = "peter" and (age = 29 or age = 30)

The current production of my 'combinedPredicate' looks like this:

def combinedPredicate = predicate ~ ("and"|"or") ~ predicate ^^ {
  case l ~ "and" ~ r => And(l,r)
  case l ~ "or" ~ r => Or(l,r)
}

I tried recursively referencing the combinedPredicate production within itself but that results in a stackoverflow.

btw, I'm just experimenting here... not implementing the entire ansi-99 spec ;)

+4  A: 

Well, recursion has to be delimited somehow. In this case, you could do this:

def combinedPredicate = predicate ~ rep( ("and" | "or" ) ~ predicate )
def predicate = "(" ~ combinedPredicate ~ ")" | simplePredicate
def simplePredicate = ...

So it will never stack overflow because, to recurse, it first has to accept a character. This is the important part -- if you always ensure recursion won't happen without first accepting a character, you'll never get into an infinite recursion. Unless, of course, you have infinite input. :-)

Daniel
A: 

After reading about solutions for operator precedence and came up with the following:

  def clause:Parser[Clause] = (predicate|parens) * (
            "and" ^^^ { (a:Clause, b:Clause) => And(a,b) } |
            "or" ^^^ { (a:Clause, b:Clause) => Or(a,b) } )

  def parens:Parser[Clause] = "(" ~> clause  <~ ")"

Wich is probably just another way writing what @Daniel wrote ;)

p3t0r
+3  A: 

The stack overflow you're experiencing is probably the result of a left-recursive language:

def combinedPredicate = predicate ~ ...
def predicate = combinedPrediacate | ...

The parser combinators in Scala 2.7 are recursive descent parsers. Recursive descent parsers have problems with these because they have no idea how the terminal symbol is when they first encounter it. Other kinds of parsers like Scala 2.8's packrat parser combinators will have no problem with this, though you'll need to define the grammar using lazy vals instead of methods, like so:

lazy val combinedPredicate = predicate ~ ...
lazy val predicate = combinedPrediacate | ...

Alternatively, you could refactor the language to avoid left recursion. From the example you're giving me, requiring parentheses in this language could solve the problem effectively.

def combinedPredicate = predicate ~ ...
def predicate = "(" ~> combinedPrediacate <~ ")" | ...

Now each deeper level of recursion corresponds to another parentheses parsed. You know you don't have to recurse deeper when you run out of parentheses.

Ken Bloom
regarding "lazy val" please rember also to change the explicit type declarations from ":Parser[Any]" to ":PackratParser[Any]" to use the new packrat capabilities. (As you pointed out in my question http://stackoverflow.com/questions/3343697/scala-parser-combinators-tricks-for-recursive-bnf)
svrist