I am creating a DSL, and using Scala's parser combinator library to parse the DSL. The DSL follows a simple, Ruby-like syntax. A source file can contain a series of blocks that look like this:
create_model do
at 0,0,0
end
Line endings are significant in the DSL, as they are effectively used as statement terminators.
I wrote a Scala parser that looks like this:
class ML3D extends JavaTokenParsers {
override val whiteSpace = """[ \t]+""".r
def model: Parser[Any] = commandList
def commandList: Parser[Any] = rep(commandBlock)
def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"
def eol: Parser[Any] = """(\r?\n)+""".r
def command: Parser[Any] = commandName~opt(commandLabel)
def commandName: Parser[Any] = ident
def commandLabel: Parser[Any] = stringLiteral
def statementList: Parser[Any] = rep(statement)
def statement: Parser[Any] = functionName~argumentList~eol
def functionName: Parser[Any] = ident
def argumentList: Parser[Any] = repsep(argument, ",")
def argument: Parser[Any] = stringLiteral | constant
def constant: Parser[Any] = wholeNumber | floatingPointNumber
}
Since line endings matter, I overrode whiteSpace
so that it'll only treat spaces and tabs as whitespace (instead of treating new lines as whitespace, and thus ignoring them).
This works, except for the "end" statement for commandBlock
. Since my source file contains a trailing new line, the parser complains that it was expecting just an end
but got a new line after the end
keyword.
So I changed commandBlock
's definition to this:
def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"~opt(eol)
(That is, I added an optional new line after "end").
But now, when parsing the source file, I get the following error:
[4.1] failure: `end' expected but `' found
I think this is because, after it sucks it the trailing new line, the parser is encountering an empty string which it thinks is invalid, but I'm not sure why it's doing this.
Any tips on how to fix this? I might extending the wrong parser from Scala's parser combinator library, so any suggestions on how to create a language definition with significant new line characters is also welcome.