views:

426

answers:

2

I am creating a DSL, and using Scala's parser combinator library to parse the DSL. The DSL follows a simple, Ruby-like syntax. A source file can contain a series of blocks that look like this:

create_model do
  at 0,0,0
end

Line endings are significant in the DSL, as they are effectively used as statement terminators.

I wrote a Scala parser that looks like this:

class ML3D extends JavaTokenParsers {
  override val whiteSpace = """[ \t]+""".r

  def model: Parser[Any] = commandList
  def commandList: Parser[Any] = rep(commandBlock)
  def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"
  def eol: Parser[Any] = """(\r?\n)+""".r
  def command: Parser[Any] = commandName~opt(commandLabel)
  def commandName: Parser[Any] = ident
  def commandLabel: Parser[Any] = stringLiteral
  def statementList: Parser[Any] = rep(statement)
  def statement: Parser[Any] = functionName~argumentList~eol
  def functionName: Parser[Any] = ident
  def argumentList: Parser[Any] = repsep(argument, ",")
  def argument: Parser[Any] = stringLiteral | constant
  def constant: Parser[Any] = wholeNumber | floatingPointNumber
}

Since line endings matter, I overrode whiteSpace so that it'll only treat spaces and tabs as whitespace (instead of treating new lines as whitespace, and thus ignoring them).

This works, except for the "end" statement for commandBlock. Since my source file contains a trailing new line, the parser complains that it was expecting just an end but got a new line after the end keyword.

So I changed commandBlock's definition to this:

def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"~opt(eol)

(That is, I added an optional new line after "end").

But now, when parsing the source file, I get the following error:

[4.1] failure: `end' expected but `' found

I think this is because, after it sucks it the trailing new line, the parser is encountering an empty string which it thinks is invalid, but I'm not sure why it's doing this.

Any tips on how to fix this? I might extending the wrong parser from Scala's parser combinator library, so any suggestions on how to create a language definition with significant new line characters is also welcome.

A: 

You can either override the protected val whiteSpace (a Regex) whose default is """\s+""".r or override the protected def handleWhiteSpace(...) method if you need more control than is readily achieved with a regular expression. Both these members orginate in RegexParsers, which is the base class for JavaTokenParsers.

Randall Schulz
I am overriding `whiteSpace` (see the code above), but that still results in an error.
mipadi
Yes, I see. Try changing that `opt(eol)` to `eol *` (or, equally, `rep(eol)`).
Randall Schulz
Didn't work. It resulted in the same error.
mipadi
+2  A: 

I get the same error in both ways, but I think you are misinterpreting it. What it's saying is that it is expecting an end, but it already reached the end of the input.

And the reason that is happening is that end is being read as a statement. Now, I'm sure there's a nice way to solve this, but I'm not experienced enough with Scala parsers. It seems the way to go would be to use token parsers with a scanning part, but I couldn't figure a way to make the standard token parser not treat newlines as whitespace.

So, here's an alternative:

import scala.util.parsing.combinator.JavaTokenParsers

class ML3D extends JavaTokenParsers {
  override val whiteSpace = """[ \t]+""".r
  def keywords: Parser[Any] = "do" | "end"
  def identifier: Parser[Any] = not(keywords)~ident

  def model: Parser[Any] = commandList
  def commandList: Parser[Any] = rep(commandBlock)
  def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"~opt(eol)
  def eol: Parser[Any] = """(\r?\n)+""".r
  def command: Parser[Any] = commandName~opt(commandLabel)
  def commandName: Parser[Any] = identifier
  def commandLabel: Parser[Any] = stringLiteral
  def statementList: Parser[Any] = rep(statement)
  def statement: Parser[Any] = functionName~argumentList~eol
  def functionName: Parser[Any] = identifier
  def argumentList: Parser[Any] = repsep(argument, ",")
  def argument: Parser[Any] = stringLiteral | constant
  def constant: Parser[Any] = wholeNumber | floatingPointNumber
}
Daniel
I like your interpretation of the error message. I wonder if there is a way to have the parser print what it's trying to match as it goes along. That would make troubleshooting easier.
huynhjl
You can wrap any reference to a production appearing in another production's right-hand side in `log(...)` and you'll get trace output whenever the parse attempts to match that non-terminal. E.g., to log a particular attempt to match `model` replace that non-terminal reference in a rule with `log(model)`.
Randall Schulz
Ah, yes, I see the issue now -- `end` was being read under `functionName`, since it *was* a valid function name. I implemented your changes and it works fine now, thanks a lot.
mipadi
@Randall I have created a separate question for logging parse attempts http://stackoverflow.com/questions/2387892/parser-combinator-not-terminating-how-to-log-what-is-going-on. The tip is really helpful.
huynhjl