tags:

views:

472

answers:

1

This interesting question http://stackoverflow.com/questions/2837267/ concerned how to do a negative look-ahead in MySQL. The poster wanted to get the effect of

Kansas(?! State)

because MySQL doesn't implement look-ahead assertions, a number of answers came up the equivalent

Kansas($|[^ ]| ($|[^S])| S($|[^t])| St($|[^a])| Sta($|[^t])| Stat($|[^e]))

The poster pointed out that's a PITA to do for potentially lots of expressions.

Is there a script/utility/mode of PCRE (or some other package) that will convert a PCRE (if possible) to an equivalent regex that doesn't use Perl's snazzy features? I'm fully aware that some Perl-style regexes cannot be stated as an ordinary regex, so I would not expect the tool to do the impossible, of course!

+2  A: 

You don't want to do this. It isn't actually mindbogglingly difficult to translate the advanced features to basic features - it's just another flavor of compiler, and compiler writers are pretty clever people - but most of the things that the snazzy features solve are (a) impossible to do with a standard regex because they recognize non-regular languages, so you'd have to approximate them so that at least they work for a limited-length text or (b) possible, but only with a regex of exponential size. And 'exponential' is compsci-speak for "don't go there". You will get swamped in OutOfMemory errors and seemingly-infinite loops if you try to use an exponential solution on anything you would actually want to process.

In other words, Abandon all hope, ye who enter here. It is virtually always better to let the regex do what it's good at and do the rest with other tools. Even such a simple thing as inverting a regex is much, much easier solved with the original regex in combination with the negation operator than with the monstrosity that would result from an accurate regex inverter.

Kilian Foth