tags:

views:

81

answers:

4

Hi,

I'm trying to devise a regex pattern (in PHP) which will allow for any alternation of two subpatterns. So if pattern A matches a group of three letters, and B matches a group of 2 numerals, all of these would be OK:

aaa
aaa66bbb
66
67abc
12abc34def56ghi78jkl

I don't mind which subpattern starts or ends the sequence, just that after the first match, the subpatterns must alternate. I'm totally stumped by this - any advice will be gratefully received!

+2  A: 
"/^(?:$A(?:$B$A)*$B?|$B(?:$A$B)*$A?)\$/"

will match either pattern A followed by however many alternating pattern B's and pattern A's, and maybe a final B...or a B followed by however many A-B pairs plus an A if it's there.

I've made this a string (and escaped the final $) cause you're going to have some interpolation to do. Make sure $A and $B are in some kind of grouping (like parentheses) if you want the ?'s to match the right thing. In your examples, $A might be '([a-zA-Z]{3})' and $B might be '(\d\d)'.

Note, if you want to match some number of the same letter or digit, or instances of the same set of letters or digits, you'll need to do some magic with backreferences -- probably named ones, since any numbered backreference will depend on the number of capture groups before the one you want (or between the one you want and where you are), but that number gets complicated if the subpatterns have parentheses in them.

cHao
([a-zA-Z]{3}) this will match 'aXu'. And (\d\d) will match '10'
jigfox
@jigfox: The patterns match "a group of three letters" and "a group of two numerals", which is exactly what the OP said.
cHao
yeah, but the examples suggest a more specific group
jigfox
The examples are *examples*. They're not the only possible patterns, and no example was given of what *shouldn't* match.
cHao
I think this is perfect. Sorry about the confusion - I've improved my examples above. You should understand that the eventual patterns, A and B, are much more complex than just `([a-zA-Z]{3})` and `(\d\d)`. However, the interpolation cHao describes was exactly what I planned to do eventually. i just couldn't get my head round the syntax for the alternation of the patterns.
George Crawford
A: 

Take a look at this (and check conditional subpatterns). I've personally never used them but seems to be what you're looking for.

kirbuchi
A: 
/\b(?:(([a-z])\2\2)(?:(([0-9])\4)\1)*(?:([0-9])\5)?|(([0-9])\7)(?:(([a-z])\9\9)\6)*(?:([a-z])\10\10)?)\b/

or if you want to allow any non digit char in the group of three:

/\b(?:((\D)\2\2)(?:((\d)\4)\1)*(?:(\d)\5)?|((\d)\7)(?:((\D)\9\9)\6)*(?:(\D)\10\10)?)\b/

This will match any pattern that consists of two alternating groups one group consists of 3 times the same char and the other of 2 times the same digit.

This Regex will match

aaa
11
bbb22
33ccc
ddd44ddd
55eee55
fff66fff66
77ggg77ggg

But not

aaa11bbb
jigfox
This does not allow for "any alternation of two subpatterns". It will only match the example subpatterns, when what was asked for was a more general solution.
cHao
I would say thius is matter of interpretation!
jigfox
Oh, so NOW it's "matter of interpretation"! While you were downvoting me, it wasn't so subjective...
cHao
Soory about that, perhaps it was a bit quick, but with the examples given, I'm pretty sure he dowsn't want mixed characters or digits. So I believe you're wrong. And we're even now!
jigfox
I was not clear enough. I've edited my original post: `aaa11bbb` is a valid match, so cHao is correct I'm afraid. But blame me!
George Crawford
+1  A: 

Here's a general solution:

^(?:[a-z]{3}(?![a-z]{3})|[0-9]{2}(?![0-9]{2}))+$

It's a simple alternation--three letters or two digits--but the negative lookaheads ensure that the same alternative is never matched twice in a row. Here's a slightly more elegant solution just for PHP:

/^(?:([a-z]{3})(?!(?1))|([0-9]{2})(?!(?2)))+$/

Instead of typing the same subpatterns multiple times, you can put them capturing groups and use (?1), (?2), etc. to apply them again wherever else you want--in this case, in the lookaheads.

Alan Moore
I like this option too - similar to cHao's answer, but perhaps a bit more stylish?Thanks Alan.
George Crawford