views:

180

answers:

1

I was reading Bernie Pope's slides on "Parser combinators in Scala". He quotes the method signature type of the "alternative" combinator |:

def | [U >: T](q: => Parser[U]): Parser[U]

and asks, "Homework: why doesn’t | have this type instead?"

def | [U](q: => Parser[U]): Parser[Either[T,U]]
+2  A: 
case class Stooge(name: String)
val moe: Parser[String] = "Moe"
val larry: Parser[String] = "Larry"
val curly: Parser[String] = "Curly"
val shemp: Parser[String] = "Shemp"

val stooge: Parser[Stooge] = (moe | larry | curly | shemp) ^^ { s => Stooge(s) }

Now, imagine the code you would have to write instead of { s => Stooge(s) } if you were working with an s: Either[Either[Either[String,String],String],String] instead of a s: String.

Mitch Blevins
If your tokens are fixed, as each is in this example, you needn't turn them into REs. Just drop the ".r" bits and your parser will be semantically identical but won't bother creating fixed-string RE patterns and matchers.
Randall Schulz
Thanks, yes, that's rather obvious in retrospect. I think what hung me up was the idea that, with "|" as implemented, you end up losing type information which could then be awkward to get back -- the fear being that, for example, a Parser[Any] is somewhat less useful than a Parser[Either[Int, String]]. On reflection this isn't really an issue, because you just arrange it so that the alternatives share a useful common result supertype.
Matt R
Thanks, Randall. I had originally used "[Mm]oe".r, but thought I was just clouding the point. The regex conversion was vestigial.
Mitch Blevins
Perhaps more to the point, keywords written this way will match prefixes of longer words such as "Moesha" or "Shemphill". The best thing I've come up with to handle this in RegexParsers instances is to put a word boundary marker at the end: "Moe\b".r etc.
Randall Schulz
I had never considered having the word boundary in the regex, which would simplify some aspects of my solution: http://gist.github.com/259855
Mitch Blevins