views:

49

answers:

1

I want to accept an arbitrary regular expression from the user and anchor it on both sides in order to enforce a full match (^<user's-regex>$) however I don't know if I have to take into account the fact that the user may have already anchored his regex.

It looks like Perl, C++, .NET and JavaScript all allow double multiple anchoring.

"hello" =~ /^h/ # true
"hello" =~ /^^h/ # true
"hello" =~ /^^^h/ # true
"hello" =~ /e/ # true
"hello" =~ /^e/ # false
"hello" =~ /^^e/ # false

Does anyone know if this is specified to work this way? Can I depend on this behaviour or is it an accident that is liable to change in the future?


Edit: The reason we need this is that we're using VBScript's regex's (from COM), we're using match however this returns all matches so it's much slower to match the string abc to .*a.* than to ^.*a.*$. By using the anchoring as suggested by @Tim we speed matches up (for long strings) by more than a factor of 12.

+4  A: 

You can depend on this behavior. The regex engine doesn't mind asserting the same thing once, twice, or a hundred times in a row.

However, instead of simply adding anchors around the regex, you should also add a non-capturing group around it:

^(?: - user regex - )$ or preferably, if your regex flavor allows this: \A(?: - user regex - )\Z

Otherwise, you'll trip up if the user uses alternation in his regex. Compare:

user regex:         hello|bye
anchored regex:     ^hello|bye$      // alternation now affects anchors
correctly anchored: ^(?:hello|bye)$
Tim Pietzcker
I'm updating the question with the motivation, thanks your answer was very helpful.
Motti