ansaurus

Question

Answer 1

+2 A:

No I don't think so, but you can use these guidelines:

If it contains two quantifiers that are open-ended at the high end and they are nested then it might be O(2^n).
If it does not contain two such quantifiers then I think it cannot be O(2^n).

Quantifiers that can cause this are: *, + and {k,}.

Also note that the worst case complexity of evaluating a regular expression might be very different from the complexity on typical strings and that the complexity depends on the specific regular expression engine.

Mark Byers 2010-07-31 10:18:09

Yeap but you said "might be O(2^n)" are there a way to be sure? Is there a way like transforming the regexp so that it can be shown to be non exponential?

mathk 2010-08-01 18:22:07

Answer 2

A:

I think if we use regex to examine not very long strings, the backtracking will not be a problem. If we take just a line of text and check it against a pattern, it will not take too much time! Another point is avoiding to have too many "+" and "*" in patterns. Every "+ or *" indicates a loop.

Govan 2010-07-31 10:35:42

Answer 3

+6 A:

If your regexp engine exposes runtime exponential behavior for (x+x+)+y ,then it is broken because a DFA or NFA can recognize this pattern in linear time:

echo "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" | egrep "(x+x+)+y"
echo "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxy" | egrep "(x+x+)+y"

both answer immediately.

In fact, there are only a few cases (like backreferences) where backtracking is really needed (mainly, because a regexp with a backreference is not a regular expression in the language theoretic sense anymore). A capable implementation should switch to backtracking only when these corner cases are given.

In fairness, DFA's have a dark side too, because some regexp's have exponential size requirements, but a size contraints is easier to enforce than a time constraint and the huge DFA runs linear on the input, so it's a better bargain than a small backtracker choking on a couple of X's.

You should really read Russ Cox excellent article series about the implementation of regexp (and the pathological behavior of backtracking): http://swtch.com/~rsc/regexp/

To answer your question about decidability: You can't. Because there is not the one backtracking for regexpr. Every implementation has its own strategies to deal with exponential growth in their algorithm for certain cases and does not cover others. One rule might be fit for here and catastrophic for there.

UPDATE:

For example, one implementation could contain an optimizer which could use algebraic transformations to simplify regexps before executing them: (x+x+)+y is the same a xxx*y, which shouldn't be a problem for any backtracker. But the same optimizer wouldn't recognize the next expression and the problem is there again. Here someone described how to craft a regexpr which fools Perl's optimizer:

http://perlgeek.de/blog-en/perl-tips/in-search-of-an-exponetial-regexp.html

Luther Blissett 2010-07-31 11:04:04

Answer 4

+1 A:

Any regex without backreferences can be matched in linear time, though many regex engines out there in the real world don't do it that way (at least many regex engines that are plugged into programming language runtime environments support backreferences, and don't switch to a more efficient execution model when no backreferences are present).

There's no easy way to find out how much time a regex with backreferences is going to consume.

moritz 2010-08-09 15:03:08

ansaurus

tags:

views:

answers:

Detect if a regexp is exponential

related questions