views:

129

answers:

1

I'm currently making a scanner for a basic compiler I'm writing in Haskell. One of the requirements is that any character enclosed in single quotes (') is translated into a character literal token (type T_Char), and this includes escape sequences such as '\n' and '\t'. I've defined this part of the scanner function which works okay for most cases:

scanner ('\'':cs)       |   (length cs) == 0            =   error "Illegal character!"
                         |  head cs == '\\'             =   mkEscape (head (drop 1 cs)) : scanner (drop 3 cs)
                         |  head (drop 1 cs) == '\''    =   T_Char (head cs) : scanner (drop 2 cs)


                         where
                            mkEscape        :: Char -> Token
                            mkEscape 'n'    = T_Char '\n'
                            mkEscape 'r'    = T_Char '\r'
                            mkEscape 't'    = T_Char '\t'
                            mkEscape '\\'   = T_Char '\\'
                            mkEscape '\''   = T_Char '\''

However, this comes up when I run it in GHCi:

Main> scanner "abc '\\' def"
[T_Id "abc", T_Char '\'', T_Id "def"]

It can recognise everything else but gets escaped backslashes confused with escaped single quotes. Is this something to do with character encodings?

+4  A: 

I don't think there's anything wrong with the parser regarding your problem. To Haskell, the string will be read as

abc '\' def

because Haskell also has string escapes. So when it reaches the first quotation mark, cs contains the char sequence \' def. Obviously head cs is a backslash, so it will run mkEscape.

The argument given is head (drop 1 cs), which is ', thus mkEscape will return T_Char '\'', which is what you saw.


Perhaps you should call

scanner "abc '\\\\' def"

The 1st level of \ is for the Haskell interpreter, and the 2nd level is for scanner.

KennyTM
I see. Does this mean it will work okay when reading a file in rather than using the interpreter?
benwad
@benwad: Yes. _
KennyTM