views:

59

answers:

3

Perl has Regexp::Optimizer, is there anything similar in PHP? My searches haven't turned up anything.

For example, Regexp::Optimizer would turn

/foobar|fooxar|foozap/

into

/foo(?:[bx]ar|zap)/

Update: I wanted to keep that question really short so that people would not try to over-interpret it but it turns out it had the opposite effect. I am looking for something that takes a regular expression and outputs a functionally equivalent yet more efficient regular expression. I have found such a thing in Perl but none in PHP, and I am wondering whether such a thing exists. In that respect, I expect a yes/no answer, accompagnied with a link if applicable. Thanks and sorry for the confusion.

+1  A: 

no (afaik), but do the ctype functions for basic checking where people often use regex.

nathan
+1  A: 

Since PCRE supports Perl syntax, just use the Perl module. There is an extension in PHP to instantly invoke Perl code. But you could just exec it and cache the result, unless you need a 'live' conversion:

 $re = exec('perl -M"Regexp::Optimizer" -e 'print Regexp::Optimizer->new->optimize(qr'.escapeshellcmd($re).')';
mario
tehehe. However, I wonder if the exec is worth the gain in performance beyond the optimisations that PCRE makes when it compiles.
p00ya
True, but why do you need to optimize the regex each time you run the program? Why not just optimize whenever you move from development to test?
ircmaxell
As I was saying, better to do once and cache. exec() is probably not THAT much slower than dl(perl). But might be optimizable by compiling it into a parrot bytecode. Anyway, it's not worthwhile to try to reimplement a regexp optimizer in PHP. In PHP. PHP!
mario
I've actually tested that approach out of curiosity and it turns out that Regexp::Optimizer chokes on complicated expressions. Works well on simple stuff though. However, it's not really a practical solution as it requires access to exec() or the ability to compile PECL extensions.
Josh Davis
@Josh Davis: I've been wondering how it works anyway. Regular expressions aren't exactly a simple language to parse. And this would explain the note in the man page: "This module does, ahem, attempts to, optimize regular expressions.". It mentions that it's only really good for alternate lists, otherwise Regexp::List should be tried (but that's only for corner cases too).
mario
+1  A: 

You can pass the S option to the regex which will study it during the first compile(the regex is cached after that until the script dies)

http://php.net/manual/en/reference.pcre.pattern.modifiers.php

This is detailed fairly well in the PHP chapter in Mastering Regular Expressions Ed. 3 and in the particular example you give, it would optimize in a useful way.

Edit: Actually, after thinking about this a bit more, it wouldn't even need the S option, as PCRE will optimize that particular example on it's own. A better example would something that looks for just a few starting characters, like

/foobar|barfoo|helloworld/S

As it would then just look at items starting with [fbh]

tsgrasser
This isn't what I'm looking for but yeah, the S modifier can help in a limited number of cases (namely, non-anchored patterns with no fixed starting character). It doesn't replace actually optimizing the regexp, though.
Josh Davis