views:

134

answers:

2

I am experimenting with parser combinators and I often run into what seems like infinite recursions. Here is the first one I ran into:

import util.parsing.combinator.Parsers
import util.parsing.input.CharSequenceReader

class CombinatorParserTest extends Parsers {

  type Elem = Char

  def notComma = elem("not comma", _ != ',')

  def notEndLine = elem("not end line", x => x != '\r' && x != '\n')

  def text = rep(notComma | notEndLine)

}

object CombinatorParserTest {

  def main(args:Array[String]): Unit = {
    val p = new CombinatorParserTest()
    val r = p.text(new CharSequenceReader(","))
    // does not get here
    println(r)
  }

}

How can I print what is going on? And why does this not finish?

+2  A: 

Logging the attempts to parse notComma and notEndLine show that it is the end-of-file (shown as a CTRL-Z in the log(...)("mesg") output) that is being repeatedly parsed. Here's how I modified your parser for this purpose:

def text = rep(log(notComma)("notComma") | log(notEndLine)("notEndLine"))

I'm not entirely sure what's going on (I tried many variations on your grammar), but I think it's something like this: The EOF is not really a character artificially introduced into the input stream, but rather a sort of perpetual condition at the end of the input. Thus this never-consumed EOF pseudo-character is repeatedly parsed as "either not a comma or not an end-of-line."

Randall Schulz
I do think that EOF is artificially introduced, but you're right in saying that it is repeatedly parsed at it seems repeatedly provided when requesting an additional character when the input is already at the end of the sequence.
huynhjl
+2  A: 

Ok, I think I've figured this out. `CharSequenceReader returns '\032' as a marker for the end of the input. So if I modify my input like this, it works:

import util.parsing.combinator.Parsers
import util.parsing.input.CharSequenceReader

class CombinatorParserTest extends Parsers {

  type Elem = Char

  import CharSequenceReader.EofCh

  def notComma = elem("not comma", x => x != ',' && x!=EofCh)

  def notEndLine = elem("not end line", x => x != '\r' && x != '\n' && x!=EofCh)

  //def text = rep(notComma | notEndLine)
  def text = rep(log(notComma)("notComma") | log(notEndLine)("notEndLine"))

}

object CombinatorParserTest {

  def main(args:Array[String]): Unit = {
    val p = new CombinatorParserTest()
    val r = p.text(new CharSequenceReader(","))
    println(r)
  }

}

See source code for CharSequenceReader here. If the scaladoc mentioned it, it would have saved me a lot of time.

huynhjl
Figure out where it should be mentioned, and open a doc ticket. If you can provide a patch with the modified scaladoc, so much the better.
Daniel
Submitted https://lampsvn.epfl.ch/trac/scala/ticket/3147. There are multiple files using `EofCh`, so I'm not sure where the best place is.
huynhjl