views:

63

answers:

1

I've seen two approaches to building parsers in Scala.

The first is to extends from RegexParsers and define your won lexical patterns. The issue I see with this is that I don't really understand how it deals with keyword ambiguities. For example, if my keyword match the same pattern as idents, then it processes the keywords as idents.

To counter that, I've seen posts like this one that show how to use the StandardTokenParsers to specify keywords. But then, I don't understand how to specify the regexp patterns! Yes, StandardTokenParsers comes with "ident" but it doesn't come with the other ones I need (complex floating point number representations, specific string literal patterns and rules for escaping, etc).

How do you get both the ability to specify keywords and the ability to specify token patterns with regular expressions?

+4  A: 

I've written only RegexParsers-derived parsers, but what I do is something like this:

val name: Parser[String] = "[A-Z_a-z][A-Z_a-z0-9]*".r

val kwIf: Parser[String]    = "if\\b".r
val kwFor: Parser[String]   = "for\\b".r
val kwWhile: Parser[String] = "while\\b".r

val reserved: Parser[String] = ( kwIf | kwFor | kwWhile )

val identifier: Parser[String] = not(reserved) ~> name
Randall Schulz
I had seen this suggestion before and tried it, but had problems where it seemed to be consuming the token qualified with the not(...). But, I just tried it again and it does work. Thanks!
Michael Tiller
What is the point of the "\b" in the regexps? Surely you don't encode backspaces in your input language?!?
Michael Tiller
Corrected. I meant to specify a word boundary. Otherwise you match (pseudo-) keywords that appear as the prefix of legitimate identifiers.
Randall Schulz
OK, here is another update...based on my testing the definition of "reserved" isn't even necessary! It seems as though just defining the parsers for keywords (e.g. kwIf) does *something* (probably inside the implicit def) to change the tokenizing?!? Odd, but I've confirmed this quite explicitly. Can anybody explain this?
Michael Tiller
You'll have to be more explicit. Perhaps start a new question with the code that illustrates the phenomenon you're seeing. Or edit this one, if you think that makes more sense. But keep in mind that everything in a combinator parser is top-down. There's no state machine built from a spec, either at the lexical / regular level or at the level of the CFG productions.
Randall Schulz