I'm getting started with Haskell and I'm trying to use the Alex tool to create regular expressions and I'm a little bit lost; my first inconvenience was the compile part. How I have to do to compile a file with Alex?. Then, I think that I have to import into my code the modules that alex generates, but not sure. If someone can help me, I would be very greatful!
Why do you want to use alex to create regular expressions? If all you want is to do some regex matching etc, you should look at the regex-base package.
If it is plain Regex you want, the API is specified in text.regex.base. Then there are the implementations text.regex.Posix , text.regex.pcre and several others. The Haddoc documentation is a bit slim, however the basics are described in Real World Haskell, chapter 8. Some more indepth stuff is descriped in this SO question.
You can specify regular expression functions in Alex.
Here for example, a regex in Alex to match floating point numbers:
$space = [\ \t\xa0]
$digit = 0-9
$octit = 0-7
$hexit = [$digit A-F a-f]
@sign = [\-\+]
@decimal = $digit+
@octal = $octit+
@hexadecimal = $hexit+
@exponent = [eE] [\-\+]? @decimal
@number = @decimal
| @decimal \. @decimal @exponent?
| @decimal @exponent
| 0[oO] @octal
| 0[xX] @hexadecimal
lex :-
@sign? @number { strtod }
When we match the floating point number, we dispatch to a parsing function to operate on that captured string, which we can then wrap and expose to the user as a parsing function:
readDouble :: ByteString -> Maybe (Double, ByteString)
readDouble str = case alexScan (AlexInput '\n' str) 0 of
AlexEOF -> Nothing
AlexError _ -> Nothing
AlexToken (AlexInput _ rest) n _ ->
case strtod (B.unsafeTake n str) of d -> d `seq` Just $! (d , rest)
A nice consequence of using Alex for this regex matching is that the performance is good, as the regex engine is compiled statically. It can also be exposed as a regular Haskell library built with cabal. For the full implementation, see bytestring-lexing.
The general advice on when to use a lexer instead of a regex matcher would be that, if you have a grammar for the lexemes you're trying to match, as I did for floating point, use Alex. If you don't, and the structure is more ad hoc, use a regex engine.