views:

904

answers:

6

When I write Erlang programs which do text parsing, I frequently run into situations where I would love to do a pattern match using a regular expression.

For example, I wish I could do something like this, where ~ is a "made up" regular expression matching operator:

my_function(String ~ ["^[A-Za-z]+[A-Za-z0-9]*$"]) ->
    ....

I know about the regular expression module (re) but AFAIK you cannot call functions when pattern matching or in guards.

Also, I wish matching strings could be done in a case-insensitive way. This is handy, for example, when parsing HTTP headers, I would love to do something like this where "Str ~ {Pattern, Options}" means "Match Str against pattern Pattern using options Options":

handle_accept_language_header(Header ~ {"Accept-Language", [case_insensitive]}) ->
    ...

Two questions:

  1. How do you typically handle this using just standard Erlang? Is there some mechanism / coding style which comes close to this in terms of conciseness and easiness to read?

  2. Is there any work (an EEP?) going on in Erlang to address this?

+2  A: 

You can use the re module:

re:run(String, "^[A-Za-z]+[A-Za-z0-9]*$").
re:run(String, "^[A-Za-z]+[A-Za-z0-9]*$", [caseless]).

EDIT:

match(String, Regexps) -> 
  case lists:dropwhile(
               fun({Regexp, Opts}) -> re:run(String, Regexp, Opts) =:= nomatch;
                  (Regexp) -> re:run(String, Regexp) =:= nomatch end,
               Regexps) of
    [R|_] -> R;
    _     -> nomatch
  end.

example(String) ->
  Regexps = ["$RE1^", {"$RE2^", [caseless]}, "$RE3"]
  case match(String, Regexps) of
    nomatch -> handle_error();
    Regexp -> handle_regexp(String, Regexp)
    ...
Zed
Yes, the re module does a great job at regular expressions, but you AFAIK you cannot call functions while pattern matching or in guards.
Cayle Spandon
If I only understood what you mean by pattern matching... should Erlang make-up a regular expression for you that matches the string, or what?
Zed
I think what he would like is something like an is_match(RegExp,S) bif for use in guards, so:foo(X) when is_match(RE1,X) -> one_thing();foo(X) when is_match(RE2,X) -> another_thing().etc.
Rob Charlton
OK, I added an example of what could be done, if that's the case.
Zed
+3  A: 
  1. For string, you could use the 're' module : afterwards, you iterate over the result set. I am afraid there isn't another way to do it AFAIK: that's why there are regexes.

  2. For the HTTP headers, since there can be many, I would consider iterating over the result set to be a better option instead of writing a very long expression (potentially).

  3. EEP work : I do not know.

jldupont
Assigning a "down-vote" without a reason is non-productive.
jldupont
+1  A: 

You can't pattern match on regular expressions, sorry. So you have to do

my_function(String) -> Matches = re:run(String, "^[A-Za-z]+[A-Za-z0-9]*$"),
                       ...
Alexey Romanov
+3  A: 

You really don't have much choice other than to run your regexps in advance and then pattern match on the results. Here's a very simple example that approaches what I think you're after, but it does suffer from the flaw that you need to repeat the regexps twice. You could make this less painful by using a macro to define each regexp in one place.

-module(multire).

-compile(export_all).

multire([],_) ->
    nomatch;
multire([RE|RegExps],String) ->
    case re:run(String,RE,[{capture,none}]) of
    match ->
        RE;
    nomatch ->
        multire(RegExps,String)
    end.


test(Foo) ->
    test2(multire(["^Hello","world$","^....$"],Foo),Foo).

test2("^Hello",Foo) ->
    io:format("~p matched the hello pattern~n",[Foo]);
test2("world$",Foo) ->
    io:format("~p matched the world pattern~n",[Foo]);
test2("^....$",Foo) ->
    io:format("~p matched the four chars pattern~n",[Foo]);
test2(nomatch,Foo) ->
    io:format("~p failed to match~n",[Foo]).
Rob Charlton
+5  A: 

A possibility could be to use Erlang Web-style annotations (macros) combined with the re Erlang module. An example is probably the best way to illustrate this.

This is how your final code will look like:

[...]
?MATCH({Regexp, Options}).
foo(_Args) ->
  ok.
[...]

The MATCH macro would be executed just before your foo function. The flow of execution will fail if the regexp pattern is not matched.

Your match function will be declared as follows:

?BEFORE.
match({Regexp, Options}, TgtMod, TgtFun, TgtFunArgs) ->
String = proplists:get_value(string, TgtArgs),
case re:run(String, Regexp, Options) of
  nomatch ->
    {error, {TgtMod, match_error, []}};
  {match, _Captured} ->
    {proceed, TgtFunArgs}
end.

Please note that:

  • The BEFORE says that macro will be executed before your target function (AFTER macro is also available).
  • The match_error is your error handler, specified in your module, and contains the code you want to execute if you fail a match (maybe nothing, just block the execution flow)
  • This approach has the advantage of keeping the regexp syntax and options uniform with the re module (avoid confusion).

More information about the Erlang Web annotations here:

http://wiki.erlang-web.org/Annotations

and here:

http://wiki.erlang-web.org/HowTo/CreateAnnotation

The software is open source, so you might want to reuse their annotation engine.

Roberto Aloi
+1  A: 
  1. Erlang does not handle regular expressions in patterns.
  2. No.
rvirding