views:

103

answers:

2

How do I filter the sequence of tokens coming from my Lexer to my Parser when using Scala parser combinators?

Let me explain - suppose I have the fairly standard pattern of a Lexer (extending StdLexical) and a Parser (extending StdTokenParsers). The lexer turns a sequence of characters to a sequence of tokens, then the parser turns the sequence of tokens to an abstract syntax tree (of type Expr).

I decide that some tokens, which could occur anywhere in the stream, I would like to have the option of filtering out, so I would like a function that would fit between the Lexer and Parser to remove these tokens. For example, I might want the lexer to tokenise comments, and then filter out these comments later.

What is the best way of writing this filter? This could use the parser combinator idiom, but doesn't have to.

Sample current code:

 val reader = new PagedSeqReader(PagedSeq.fromReader(reader))
 val tokens = new MyParser.lexical.Scanner(reader)
 val parse = MyParser.phrase(parser)(tokens)

I would like to be able to write something like this:

 val reader = new PagedSeqReader(PagedSeq.fromReader(reader))
 val tokens = new MyParser.lexical.Scanner(reader)
 val parse = MyParser.phrase(parser)(filter(tokens))
+1  A: 

Have you considered using a RegexParsers to remove whitespace and comments?

EDIT

You can make a simple filter

import scala.util.parsing.input._

object ReaderFilter {
  def filter[T](reader: Reader[T], check: T => Boolean): Reader[T] = {
    new Reader[T] {
      var orig = reader
      def first = { trim; orig.first }
      def atEnd = { trim; orig.atEnd }
      def rest: Reader[T] = { trim; ReaderFilter.filter(orig.rest, check) }
      def pos = orig.pos
      private def trim = {
        while (!orig.atEnd && !check(orig.first))
          orig = orig.rest
      }
    }
  }
}

and use it in this manner (to remove tokens that are "#"):

val tokens = ReaderFilter.filter(new MyParser.lexical.Scanner(reader), 
          {t:ExprParser.lexical.Token => t.chars != "#"})
Magnus
I already do that appropriately. I want to write the filter as described in the question. The application isn't removing comments, this is just the easiest way of explaining the problem.
Nick Fortescue
I wrote it myself, then came back and realised you'd edited! You can have the accepted answer, interesting how the code is so similar. I used recursion and you used a while loop, but apart from that they are almost the same :-)
Nick Fortescue
+1  A: 

I've done it now, here are the results. The key insight is that a Parser from the parser combinator uses a scala.util.parsing.input.Reader as input. So we need a class that wraps a Reader, and itself is a Reader which filters out entries on some condition.

I write the Reader so on construction it skips all unwanted entries and stops at either the first good entry or the end. Then every call is delegated to the original reader except for rest which constructs another TokenFilter in turn.

import scala.util.parsing.input._

class Filter[T](parent : Reader[T], exclude : T=>Boolean) extends Reader[T] {
  private val start = nextOk(parent)
  def nextOk(r : Reader[T]) : Reader[T] =
    if(r.atEnd) r else (if (exclude(r.first)) nextOk(r.rest) else r)

  override def source = start.source
  override def offset: Int = start.offset
  override def first: T = start.first
  override def rest: Reader[T] = new Filter(start.rest, exclude)
  override def pos: Position = start.pos
  override def atEnd = start.atEnd
}
Nick Fortescue