tags:

views:

106

answers:

3

I currently use 3 different regular expressions in one preg_match, using the or sign | to separate them. This works perfectly. However the first and second regex have the same type of output. e.g. [0] Source Text [1] Number Amount [2] Name - however the last one since it uses a different arrangement of source text results in: [0] Source Text [1] Name [2] Number Amount.

    preg_match('/^Guo (\d+) Cars @(\w+)|^AV (\d+) Cars @(\w+)|^@(\w+) (\d+) [#]?av/i', $source, $output);

Since Name is able to be numeric I can't do a simple check to see if it is numeric. Is there a way I can either switch the order in the regex or identify which regex it matched too. Speed is of the essence here so I didn't want to use 3 separate preg_match statements (and more to come).

+3  A: 

You could use named capture groups:

preg_match('/^Guo (?P<number_amount>\d+) Cars @(?P<name>\w+)|^AV (?P<number_amount>\d+) Cars @(?P<name>\w+)|^@(?P<name>\w+) (?P<number_amount>\d+) [#]?av/i', $source, $output);
Greg
+3  A: 

3 seperate doesnt have to be slower. One big statement will mean alot of backtracing for the regex engine. Key in regex optimations is to make the engine fail asap. Did you do some benchmarking pulling them appart ?

In your case you can make use of the PCRE's named captures (?<name>match something here) and replace with ${name} instead of \1. Im not 100% certain this works for preg_replace i know preg_match correctly stores named captures for certain though.

PCRE needs to be compiled with the PCRE_DUPNAMES option for that to be useful in your case (as in RoBorg's) post. Im not sure if php's compiled PCRE dll has that option set.

Martijn Laarman
Hi Martijn,Thank you for your answer, you are right the PCRE is not compiled with the DUPNAMES option here meaning I can't use the same group names.I wasn't aware that seperate REGEX's might be faster. I haven't done any benchmarking there yet.
Ice
A: 

I don’t know since what version PCRE supports the duplicate subpattern numbers syntax (?| … ). But try this regular expression:

/^(?|Guo (\d+) Cars @(\w+)|AV (\d+) Cars @(\w+)|@(\w+) (\d+) #?av)/i

So:

$source = '@abc 123 av';
preg_match('/^(?|Guo (\\d+) Cars @(\\w+)|AV (\\d+) Cars @(\\w+)|@(\\w+) (\\d+) #?av)/i', $source, $output);
var_dump($output);
Gumbo