views:

333

answers:

1

I'm writing a lexer in haskell. Here's the code:

lexer :: String -> [Token]
lexer s
    | s =~ whitespace :: Bool =
     let token = s =~ whitespace :: String in
      lex (drop (length token) s)
    | s =~ number :: Bool =
     let token = s =~ number :: String in
      Val (read token) : lex (drop (length token) s)
    | s =~ operator :: Bool =
     let token = s =~ operator :: String in
      Oper token : lex (drop (length token) s)
    | otherwise = error "unrecognized character"
    where
     whitespace = "^[ \t\n]"
     number = "^[0-9]*(\.[0-9]+)?"
     operator = "^[+-*/()]"

data Token = Val Int | Oper String

There are two problems I'm having. One, the number regex "^[0-9]*(\.[0-9]+)?" throws this error:

lexical error in string/character literal at character '['

And when I comment out the line containing it and the part of the function that uses it, I get this error:

Couldn't match expected type `Token'
           against inferred type `(String, String)'
      Expected type: [Token]
      Inferred type: [(String, String)]
    In the expression: lex (drop (length token) s)
    In the expression:
        let token = s =~ whitespace :: String
        in lex (drop (length token) s)

I have no idea why I'm getting either of these errors. Can someone help me?

+6  A: 

Backslashes are used as escape characters in string literals, like in "\n" for a string containing a newline. If you want a literal backslash you need to escape it as "\\". That's the problem in the regex "^[0-9]*(\.[0-9]+)?", the Haskell parser tries to interpret "\." as a normal string escape and chokes on it (probably because there is no such escape). If you write the regex as "^[0-9]*(\\.[0-9]+)?" the error goes away.

The reason for the type problem is that you call lex from the standard Prelude in lex (drop (length token) s), which has type String -> [(String, String)]. Probably you wanted to do a recursive call to your own function lexer instead...

sth
Thanks. :D On to the parser!
Micah