tags:

views:

145

answers:

2

I am using Scala combinatorial parser by extending scala.util.parsing.combinator.syntactical.StandardTokenParser. This class provides following methods

def ident : Parser[String] for parsing identifiers and

def numericLit : Parser[String] for parsing a number (decimal I suppose)

I am using scala.util.parsing.combinator.lexical.Scannersfrom scala.util.parsing.combinator.lexical.StdLexicalfor lexing.

My requirement is to parse a hexadecimal number (without the 0x prefix) which can be of any length. Basically a grammar like: ([0-9]|[a-f])+

I tried integrating Regex parser but there are type issues there. Other ways to extend the definition of lexer delimiter and grammar rules lead to token not found!

+2  A: 

You can use the RegexParsers with an action associated to the token in question.

import scala.util.parsing.combinator._

object HexParser extends RegexParsers {
  val hexNum: Parser[Int] = """[0-9a-f]+""".r ^^ 
           { case s:String => Integer.parseInt(s,16) } 

  def seq: Parser[Any] = repsep(hexNum, ",")

}

This will define a parser that reads comma separated hex number with no prior 0x. And it will actually return a Int.

val result = HexParser.parse(HexParser.seq, "1, 2, f, 10, 1a2b34d")
scala> println(result)
[1.21] parsed: List(1, 2, 15, 16, 27439949)

Not there is no way to distinguish decimal notation numbers. Also I'm using the Integer.parseInt, this is limited to the size of your Int. To get any length you may have to make your own parser and use BigInteger or arrays.

Thomas
this would suit me if I was parsing just hex numbers. The overall implementation is much bigger than the regex parsing. My goal is to use StandardTokenParser and using the regex parser inside it gives type error " found : HexParser.Parser[Any] required: scratch.Parser[?]"when invoking parse(HexParser.seq) from within a StandardTokenParser.
thequark
I think the solution would involve changing the lexer being used.
thequark
Probably, specially if it should be context sensitive. I just provided a way to get hex string into Int. What kind of file are you parsing? Maybe you could pre-process to make it more Lexer friendly.
Thomas
+1  A: 

As I thought the problem can be solved by extending the behavior of Lexer and not the Parser. The standard lexer takes only decimal digits, so I created a new lexer:

class MyLexer extends StdLexical {
  override type Elem = Char
  override def digit = ( super.digit | hexDigit )
  lazy val hexDigits = Set[Char]() ++ "0123456789abcdefABCDEF".toArray
  lazy val hexDigit = elem("hex digit", hexDigits.contains(_))
}

And my parser (which has to be a StandardTokenParser) can be extended as follows:

object ParseAST extends StandardTokenParsers{

  override val lexical:MyLexer = new MyLexer()
  lexical.delimiters += ( "(" , ")" , "," , "@")
  ...
 }

The construction of the "number" from digits is taken care by StdLexical class:

class StdLexical {
...

def token: Parser[Token] = 
    ...
| digit~rep(digit)^^{case first ~ rest => NumericLit(first :: rest mkString "")}
}

Since StdLexical gives just the parsed number as a String it is not a problem for me, as I am not interested in numeric value either.

thequark