tags:

views:

66

answers:

5

An answer from another question piqued my curiosity.

Consider:

$string = "asfasdfasdfasdfasdf[[sometextomatch]]asfkjasdfjaskldfj";

$regex = "/\[\[(.+?)\]\]/";
preg_match($regex, $string, $matches);

$regex = "/\[\[(.*)\]\]/"; 
preg_match($regex, $string, $matches);

I asked what the difference between the two regexes is. The aswer I got was that ".*" matches any character 0 or more times as many times as possible, and ".+?" matches any character 1 or more times as few times as possible.

I read those regexes differently so I did some experimenting on my own but didn't come to any conclusions. Php.net says "?" is equivalent to {0,1} so you could rewrite

"/\[\[(.+?)\]\]/"

as

"/\[\[((.+){0,1})\]\]/"

or as

"/\[\[(.{0,})\]\]/"

or as

"/\[\[(.*)\]\]/"

Will they capture different text? Is the difference that one is less expensive? Am I being anal?

+3  A: 

Stand-alone, ? does mean {0,1}, however, when it follows something like *, +, ?, or {3,6} (for example), ? means something else entirely, which is that it does minimal matching. So, no, you can't rewrite /\[\[(.+?)\]\]/ as /\[\[((.+){0,1})\]\]/. :-)

Chris Jester-Young
A: 

The ? will only capture it one time ( the (0,1) means 0 to 1 times) where as the * will capture it as many times as it occurs in the string.

From this page:

If you take <.+> and use it on The <em>Big</em> Dog. it will give <em>Big</em>. Where as <.+?> will only match <em>

Chacha102
+2  A: 

Just take an example where you get different results:

foo [[bar]] baz [[quux]]

Your first regular expression will match [[bar]] and [[quux]] while the second will match only [[bar]] baz [[quux]].

The reason for that is that a lazy quantifier (suffixed with ?) will match the minimum of possible repetitions the normal greedy mode will match the maximum of possible repetitions:

However, if a quantifier is followed by a question mark, then it ceases to be greedy, and instead matches the minimum number of times possible, so the pattern /\*.*?\*/ does the right thing with the C comments. The meaning of the various quantifiers is not otherwise changed, just the preferred number of matches. Do not confuse this use of question mark with its use as a quantifier in its own right. Because it has two uses, it can sometimes appear doubled, as in \d??\d which matches one digit by preference, but can match two if that is the only way the rest of the pattern matches.

Gumbo
+2  A: 

Normally, ? means "capture the preceding thing 0 or 1 times". However, when used after a * or +, a ? modifies the meaning of the * or +. Normally, */+ mean "match 0 (1 for +) or more times, and match as many as possible". Adding the ? modifies that meaning to be "match 0 (1 for +) or more times, but match as few as possible". By default those expressions are "greedy", ? modifies them to be non-greedy.

Adam Batkin
+1: This definitely needed to be noted.
R. Bemrose
A: 
Brad Gilbert