views:

538

answers:

6

Having the following regular expression:

([a-z])([0-9])\1

It matches a5a, is there any way for it to also match a5b, a5c, a5d and so on?


EDIT: Okay, I understand that I could just use ([a-z])([0-9])([a-z]) but I've a very long and complicated regular expression (matching sub-sub-sub-...-domains or matching an IPv4 address) that would really benefit from the behavior described above. Is that somehow possible to achieve with backreferences or anything else?


Anon. answer is what I need, but it seems to be erroneous.

+2  A: 

You don't need back references if the second letter is independent of the first, right?

([a-z])([0-9])([a-z])+

EDIT

If you just don't want to repeat the last part over and over again, then:

([a-z])([0-9])([a-z])

Just taking away the '+'.

BranTheMan
Thanks Bran, but please check my edit.
Alix Axel
No, I want to have the effect of the first regex you provided `([a-z])([0-9])([a-z])+` but without having to repeat the last part over and over again.
Alix Axel
A: 

I don't follow your question?

[a-z][0-9][a-z] Exactly 1
[a-z][0-9][a-z]? One or 0
[a-z][0-9][a-z]+ 1 or more
[a-z][0-9][a-z]* 0 or more
DevDevDev
Can you please check my edit? Thanks.
Alix Axel
+2  A: 

The whole point of a back-reference in a regular expression is to match the same thing as the indicated sub-expression, so there's no way to disable that behavior.

To get the behavior you want, of being able to reuse a part of a regular expression later, you could just define the parts of the regular expression you wish to reuse in a separate string, and (depending on the language you're working in) use string interpolation or concatenation to build the regular expression from the pieces.

For instance, in Ruby:

>> letter = '([a-z])'
=> "([a-z])"
>> /#{letter}([0-9])#{letter}+/ =~ "a5b"
=> 0
>> /#{letter}([0-9])#{letter}+/ =~ "a51"
=> nil

Or in JavaScript:

var letter = '([a-z])';
var re = new RegExp(letter + '([0-9])' + letter + '+');
"a5b".match(re)
Brian Campbell
+1  A: 

I suspect you're wanting something similar to the Perl (?PARNO) construct (it's not just for recursion ;).

/([a-z])([0-9])(?1)+/

will match what you want - and any changes to the first capture group will be reflected in what the (?1) matches.

Anon.
Seems to be what I'm looking for however the regex you provided gives me errors in RegexBuddy (in PCRE and Perl mode).
Alix Axel
Works in my version of Perl.
Anon.
The `(?1)` part of the regex gives me the following error in RegexBuddy in Perl mode: **Erroneous characters (possibly incomplete regex token or unescaped metacharacters)**, thanks anyway. =)
Alix Axel
Then I guess RegexBuddy doesn't handle that feature of Perl regexes. Try it in Perl itself and you'll see that it works.
Anon.
I don't doubt you, but I actually need this regex for a PHP project. =\ It's good to know, nonetheless.
Alix Axel
Just trying it now, it works on my version of PHP too. Give it a try in the real world, rather than just in RegexBuddy, before dismissing it.
Anon.
+2  A: 

The answer is not with backreferences

Backreference means match the value that was previously matched. It does not mean match the previous expression. But if your language allows it you can substitute a variable in a string into your expression before compiling it.

Tcl:

set exp1 "([a-z])"
regexp "${exp1}([0-9])${exp1}+" $string

Javascript:

var exp1 = '([a-z])';
var regexp = new RegExp(exp1 + '([0-9])' + exp1 + '+');
string.match(regexp);

Perl:

my $exp1 = '([a-z])';
$string =~ /${exp1}([0-9])${exp1}+/;
slebetman
A: 

Backreferences are for retrieving data from earlier in the regex and using it later on. They aren't for fixing stylistic issues. A regex with backreferences will not function as one without. You might just need to get used to regexes being repetitive and ugly.

Maybe try Python, which makes it easy to build regexes up from smaller blocks. Not clear if you're allowed to change your environment… you're lucky to have backreferences in the first place.

Potatoswatter