-edit- NOTE the ?
at the end of .{2,}?
I found out you can write
.{2,}?
Isnt that exactly the same as below?
.{2}
-edit- NOTE the ?
at the end of .{2,}?
I found out you can write
.{2,}?
Isnt that exactly the same as below?
.{2}
No. {2,}
means two times or more while {2}
means exactly two times. Quantifiers are greedy by default, so given the string foo
you would get foo
if you use .{2,}
, but fo
if you use .{2,}?
because you made it lazy. However, the latter is allowed to match more than two times if necessary, but .{2}
always means exactly two characters.
So if you have the string test123
and the pattern .{2,}?\d
, you would get test1
because it has to match up to four characters so the \d
can also match.
Not exactly Using PHP to do a regexp match and display the capture
$string = 'aaabbaabbbaaa';
$search = preg_match_all('/b{2}a/',$string,$matches,PREG_SET_ORDER );
echo '<pre>';
var_dump($matches);
echo '</pre>';
$search = preg_match_all('/b{2,}?a/',$string,$matches,PREG_SET_ORDER );
echo '<pre>';
var_dump($matches);
echo '</pre>';
First result gives:
array(2) {
[0]=>
array(1) {
[0]=>
string(3) "bba"
}
[1]=>
array(1) {
[0]=>
string(3) "bba"
}
}
second gives:
array(2) {
[0]=>
array(1) {
[0]=>
string(3) "bba"
}
[1]=>
array(1) {
[0]=>
string(4) "bbba"
}
}
With b{2} the capture only returns 2 b's, with b{2,} it returns 2 or more
No, they are different. ^.{2,}?$
matches strings whose length is at least 2 (as seen on rubular.com):
12
123
1234
By contrast, ^.{2}$
only matches strings whose length is exactly 2 (as seen on rubular.com).
It's correct that being reluctant, .{2,}?
will first attempt to match only two characters. But for the overall pattern to match, it can take more. This is not the case with .{2}
, which can only match exactly 2 characters.
In isolation they probably behave identical but not inside larger expressions because the lazy version is allowed to match more than two symbols.
abx abcx
^.{2,}?x$ match match
^.{2}x$ match no match
x.{2,}?x
matches "xasdfx"
in "xasdfxbx"
but x.{2}x
does not match at all.
Without the trailing ?
, the first one will match the whole string.
No, they are different :
.{2,}?
: Any character, at least 2 repetitions, as few as possible
.{2}
: Any character, exactly 2 repetitions
What makes this question especially interesting is that there are times when .{2,}?
is equivalent to .{2}
, but it should never happen. Others have already pointed out how a reluctant quantifier at the very end of a regex always matches the minimum number of of characters because there's nothing after it to force it to consume more.
The other place they shouldn't be used is at the end of a subexpression inside an atomic group. For example, suppose you try to match foo bar
with
f(?>.+?) bar
The subexpression initially consumes the first 'o' and hands off to the next part, which tries unsuccessfully to match a space. Without the atomic group, it would backtrack and let the .+?
consume another character. But it can't backtrack into the atomic group, and there's no wiggle room before the group, so the match attempt fails.
A reluctant quantifier at the end of a regex or at end of an atomic subexpression is definite code smell.