I'm currently making a scanner for a basic compiler I'm writing in Haskell. One of the requirements is that any character enclosed in single quotes (') is translated into a character literal token (type T_Char), and this includes escape sequences such as '\n' and '\t'. I've defined this part of the scanner function which works okay for most cases:
scanner ('\'':cs) | (length cs) == 0 = error "Illegal character!"
| head cs == '\\' = mkEscape (head (drop 1 cs)) : scanner (drop 3 cs)
| head (drop 1 cs) == '\'' = T_Char (head cs) : scanner (drop 2 cs)
where
mkEscape :: Char -> Token
mkEscape 'n' = T_Char '\n'
mkEscape 'r' = T_Char '\r'
mkEscape 't' = T_Char '\t'
mkEscape '\\' = T_Char '\\'
mkEscape '\'' = T_Char '\''
However, this comes up when I run it in GHCi:
Main> scanner "abc '\\' def"
[T_Id "abc", T_Char '\'', T_Id "def"]
It can recognise everything else but gets escaped backslashes confused with escaped single quotes. Is this something to do with character encodings?