ansaurus

Question

Regular expressions versus lexical analyzers in Haskell

Answer 1

+3 A:

Why do you want to use alex to create regular expressions? If all you want is to do some regex matching etc, you should look at the regex-base package.

augustss 2010-06-21 22:35:38

What I really want to do is a lexical and sintactic analyzer, that's why Iḿ working with alex :)

Anny 2010-06-22 01:10:07

Answer 2

+1 A:

If it is plain Regex you want, the API is specified in text.regex.base. Then there are the implementations text.regex.Posix , text.regex.pcre and several others. The Haddoc documentation is a bit slim, however the basics are described in Real World Haskell, chapter 8. Some more indepth stuff is descriped in this SO question.

HaskellElephant 2010-06-21 23:11:56

What I really want to do is a lexical and sintactic analyzer, that's why Iḿ working with alex :)

Anny 2010-06-22 01:09:51

Answer 3

+3 A:

You can specify regular expression functions in Alex.

Here for example, a regex in Alex to match floating point numbers:

$space       = [\ \t\xa0]
$digit       = 0-9
$octit       = 0-7
$hexit       = [$digit A-F a-f]

@sign        = [\-\+]
@decimal     = $digit+
@octal       = $octit+
@hexadecimal = $hexit+
@exponent    = [eE] [\-\+]? @decimal

@number      = @decimal
             | @decimal \. @decimal @exponent?
             | @decimal @exponent
             | 0[oO] @octal
             | 0[xX] @hexadecimal

lex :-

   @sign? @number { strtod }

When we match the floating point number, we dispatch to a parsing function to operate on that captured string, which we can then wrap and expose to the user as a parsing function:

readDouble :: ByteString -> Maybe (Double, ByteString)
readDouble str = case alexScan (AlexInput '\n' str) 0 of
    AlexEOF            -> Nothing
    AlexError _        -> Nothing
    AlexToken (AlexInput _ rest) n _ ->
       case strtod (B.unsafeTake n str) of d -> d `seq` Just $! (d , rest)

A nice consequence of using Alex for this regex matching is that the performance is good, as the regex engine is compiled statically. It can also be exposed as a regular Haskell library built with cabal. For the full implementation, see bytestring-lexing.

The general advice on when to use a lexer instead of a regex matcher would be that, if you have a grammar for the lexemes you're trying to match, as I did for floating point, use Alex. If you don't, and the structure is more ad hoc, use a regex engine.

Don Stewart 2010-06-21 23:42:27

Thank's Don Stewart, that is what I was looking for, because the really what I wanted to do is to create a lexical and sintatic analizer, that's why I was trying to create the regex.... Thank's everybody, you all helpt a lot :)

Anny 2010-06-22 01:08:57

Sorry if Iḿ bothering you, but can you explain me a little bit the last line "case strtod (B.unsafeTake n str) of d -> d `seq` Just $! (d , rest)", cause I don't get it

Anny 2010-06-22 01:21:49

Oh, that's just running 'strtod' on the lexeme, then returning the result strictly.

Don Stewart 2010-06-22 01:25:41

I took your code to see what was the result, but it gives me an error, thatś why I'd like to understand to see what itś wrong

Anny 2010-06-22 01:45:35

Grab the code from the Alex examples directory, available in the source tarball for Alex.

Don Stewart 2010-06-22 01:48:31

ok! Thank's a lot! :)

Anny 2010-06-22 01:55:01

ansaurus

tags:

views:

answers:

Regular expressions versus lexical analyzers in Haskell

related questions