What's the best way to use regular expressions with options (flags) in Haskell
I use
Text.Regex.PCRE
The documentation lists a few interesting options like compCaseless, compUTF8, ... But I don't know how to use them with (=~)
What's the best way to use regular expressions with options (flags) in Haskell
I use
Text.Regex.PCRE
The documentation lists a few interesting options like compCaseless, compUTF8, ... But I don't know how to use them with (=~)
I believe cannot use (=~) if you wish to use compOpt
other than defaultCompOpt
.
Something like this work:
match (makeRegexOpts compCaseless defaultExecOpt "(Foo)" :: Regex) "foo" :: Bool
The follow two articles should assist you:
All the Text.Regex.*
modules make heavy use of typeclasses, which are there for extensibility and "overloading"-like behavior, but make usage less obvious from just seeing types.
Now, you've probably been started off from the basic =~
matcher.
(=~) ::
( RegexMaker Regex CompOption ExecOption source
, RegexContext Regex source1 target )
=> source1 -> source -> target
(=~~) ::
( RegexMaker Regex CompOption ExecOption source
, RegexContext Regex source1 target, Monad m )
=> source1 -> source -> m target
To use =~
, there must exist an instance of RegexMaker ...
for the LHS, and RegexContext ...
for the RHS and result.
class RegexOptions regex compOpt execOpt | ...
| regex -> compOpt execOpt
, compOpt -> regex execOpt
, execOpt -> regex compOpt
class RegexOptions regex compOpt execOpt
=> RegexMaker regex compOpt execOpt source
| regex -> compOpt execOpt
, compOpt -> regex execOpt
, execOpt -> regex compOpt
where
makeRegex :: source -> regex
makeRegexOpts :: compOpt -> execOpt -> source -> regex
A valid instance of all these classes (for example, regex=Regex
, compOpt=CompOption
, execOpt=ExecOption
, and source=String
) means it's possible to compile a regex
with compOpt,execOpt
options from some form source
. (Also, given some regex
type, there is exactly one compOpt,execOpt
set that goes along with it. Lots of different source
types are okay, though.)
class Extract source
class Extract source
=> RegexLike regex source
class RegexLike regex source
=> RegexContext regex source target
where
match :: regex -> source -> target
matchM :: Monad m => regex -> source -> m target
A valid instance of all these classes (for example, regex=Regex
, source=String
, target=Bool
) means it's possible to match a source
and a regex
to yield a target
. (Other valid target
s given these specific regex
and source
are Int
, MatchResult String
, MatchArray
, etc.)
Put these together and it's pretty obvious that =~
and =~~
are simply convenience functions
source1 =~ source
= match (makeRegex source) source1
source1 =~~ source
= matchM (makeRegex source) source1
and also that =~
and =~~
leave no room to pass various options to makeRegexOpts
.
You could make your own
(=~+) ::
( RegexMaker regex compOpt execOpt source
, RegexContext regex source1 target )
=> source1 -> (source, compOpt, execOpt) -> target
source1 =~+ (source, compOpt, execOpt)
= match (makeRegexOpts compOpt execOpt source) source1
(=~~+) ::
( RegexMaker regex compOpt execOpt source
, RegexContext regex source1 target, Monad m )
=> source1 -> (source, compOpt, execOpt) -> m target
source1 =~~+ (source, compOpt, execOpt)
= matchM (makeRegexOpts compOpt execOpt source) source1
which could be used like
"string" =~+ ("regex", CompCaseless + compUTF8, execBlank) :: Bool
or overwrite =~
and =~~
with methods which can accept options
import Text.Regex.PCRE hiding ((=~), (=~~))
class RegexSourceLike regex source
where
makeRegexWith source :: source -> regex
instance RegexMaker regex compOpt execOpt source
=> RegexSourceLike regex source
where
makeRegexWith = makeRegex
instance RegexMaker regex compOpt execOpt source
=> RegexSourceLike regex (source, compOpt, execOpt)
where
makeRegexWith (source, compOpt, execOpt)
= makeRegexOpts compOpt execOpt source
source1 =~ source
= match (makeRegexWith source) source1
source1 =~~ source
= matchM (makeRegexWith source) source1
or you could just use match
, makeRegexOpts
, etc. directly where needed.
I don't know anything about Haskell, but if you're using a regex library based on PCRE, then you can use mode modifiers inside the regular expression. To match "caseless" in a case insensitive fashion, you can use this regex in PCRE:
(?i)caseless
The mode modifier (?i) overrides any case sensitivity or case insensitivity option that was set outside the regular expression. It also works with operators that don't allow you to set any options.
Similarly, (?s) turns on "single line mode" which makes the dot match line breaks, (?m) turns on "multi line mode" which makes ^ and $ match at line breaks, and (?x) turns on free-spacing mode (unescaped spaces and line breaks outside character classes are insignificant). You can combine the letters. (?ismx) turns on everything. A hyphen turns off options. (?-i) makes the regex case sensitive. (?x-i) starts a free-spacing case sensitive regex.