views:

40

answers:

2

When using the not ^ operator in combination with a back reference, why do I need to use a lazy match? It seems like the not should break the match.

For example:

<?php
preg_match('/(t)[^\1]*\1/', 'is this test ok', $matches);
echo $matches[0];
?>

Will output this test, instead of this t, in spite of the fact that the middle t does not match [^\1]. I need to use /(t)[^\1]*?\1/ to match this t.

Furthermore

preg_match('/t[^t]*t/', 'is this test ok', $matches);

does match only this t.

What is going on, and what am I misunderstanding?

+2  A: 

You cannot use backreferences inside character classes. [^\1] means "any character other than 1".

Instead, use /(t)(?:(?!\1).)*\1/.

(?:...) is a non-capturing group

(?!...) is a "negative look-ahead", asserting that the subexpression doesn't match

(?!\1)., when \1 is a single character, means "any character that does not match \1

Ben Blank
+5  A: 

It doesn't work because the \1 here is not a backreference inside a character class. The \1 is interpreted as the character with ASCII value 1.

You could use a negative lookaround instead to get the effect you want:

'/(t)(?:(?!\1).)*\1/'
Mark Byers
I had just added that exact example to my own answer. I guess great minds think alike. ;-)
Ben Blank
Thanks, regexes get me every time!
Peter Ajtai