views:

512

answers:

6

Assuming a Perl script that allows users to specify several text filter expressions in a config file, is there a safe way to let them enter regular expressions as well, without the possibility of unintended side effects or code execution? Without actually parsing the regexes and checking them for problematic constructs, that is. There won't be any substitution, only matching.

As an aside, is there a way to test if the specified regex is valid before actually using it? I'd like to issue warnings if something like /foo (bar/ was entered.

Thanks, Z.


EDIT:
Thanks for the very interesting answers. I've since found out that the following dangerous constructs will only be evaluated in regexes if the use re 'eval' pragma is used:

(?{code})
(??{code})
${code}
@{code}

The default is no re 'eval'; so unless I'm missing something, it should be safe to read regular expressions from a file, with the only check being the eval/catch posted by Axeman. At least I haven't been able to hide anything evil in them in my tests.

Thanks again. Z.

+5  A: 

You will probably have to do some level of sanitisation. For example, the perlre man page describes the following construct:

(?{ code })

which allows executable code inside a pattern match.

Greg Hewgill
You're close. That construct is not allowed when doing variable interpolation in a regex, unless you `use re 'eval';`, so this will be safe.
Leon Timmermans
+10  A: 

This

eval { 
    qr/$re/;
};
if ( $@  ) { 
    # do something
}

compiles an expression, and lets you recover from an error.

You can watch for malicious expression, since you're only going to do matching, by looking for these patterns, which would allow arbitrary code to be run:

(?: \( \?{1,2} \{  # '(' followed by '?' or '??', and then '{'
|   \@ \{ \s* \[   # a dereference of a literal array, which may be arbitrary code.
)

Make sure you compile this with the x flag.

Axeman
This doesn't protect you from denial of service attacks. Depending on the context, that may be a non issue. If the user running the program could as well do "perl -e 'while(1){}'", then it doesn't matter. If it's a server, well...
tsee
I'd chuck the regex inside an alarm() to kill anything that takes more than second or two.
Schwern
+12  A: 

Depending on what you're matching against, and the version of Perl you're running, there might be some regexes that act as an effective denial of service attack by using excessive lookaheads, lookbehinds, and other assertions.

You're best off allowing only a small, well-known subset of regex patterns, and expanding it cautiously as you and your users learn how to use the system. In the same way that many blog commenting systems allow only a small subset of HTML tags.

Eventually Parse::RecDescent might become useful, if you need to do complex analysis of regexes.

Sam Kington
+5  A: 

I would suggest not trusting any regular expressions from users. If you are actually determined to do so, please run perl in taint (-T) mode. In that case, you'll need some form of validation. Instead of using Parse::RecDescent for writing your own regular expression parser as another answer suggests, you should use the existing YAPE::Regex regexp parser which is probably faster, was written by an expert and works like a charm.

Finally, since perl 5.10.0, you can plug different regular expression engines into perl (lexically!). You could check whether there's a less powerful regular expression engine available whose syntax is more easily verifiable. If you want to go down that route, read the API description, Avar's re::engine::Plugin, or in general check out any of Avar's plugin engines.

tsee
PS: YAPE::Regex has mostly been superseded by Regexp::Parser. Neither of them handles the perl 5.10 extensions to the regexp engine yet. Jeff Pinyan, the author of both modules, said he plans to extend the modules in the near future. See also: PPIx::Regexp.
tsee
A: 

Would the Safe module be of any use with regard to compiling/executing untrusted regular expressions?

Danny
A: 

How does it work in C#? (ASP.NET) with no re "eval"

Kid