tags:

views:

1111

answers:

5

From the Pattern javadocs:

Greedy quantifiers:
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n but not more than m times

Reluctant quantifiers:
X?? X, once or not at all
X*? X, zero or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n but not more than m times

The description of what they do is the same...so, what is the difference?

I would really appreciate some examples.

I am coding in Java, but I hear this concept is the same for most modern regex implementations.

+2  A: 

A greedy quantifier will match as much as possible and still get a match A reluctant quantifier will match the smallest amount possible.

for example given the string

abcdef

the greedy qualifier

ab[a-z]*[a-z] would match abcdef

the reluctant qualifier

ab[a-z]*?[a-z] would match abc

David Waters
Actually, the greedy character class would match "cde", since a will match a, b will match b and the last [a-z] will match f. The reluctant character group will match exactly the same thing
PatrikAkerstrand
@Machine: you are wrong, [a-z]*?[a-z] would _always_ match the _first_ [a-z] character! 1. [a-z]*? first jumps to the next rule: [a-z], if that doesn't match then [a-z]*? would not match either, and that's the end of the story. But 2. if [a-z] matches, everybody is happy...
Vili
+16  A: 

A greedy operator always try to "grab" as much of the input as possible, while a reluctant quantifier will match as little of the input as possible and still create a match.

Example:

"The red fox jumped over the red fence"
/(.*)red/ => \1 = "The red fox jumped over the "
/(.*?)red/ => \1 = "The "

"aaa"
/a?a*/ => \1 = "a", \2 = "aa"
/a??a*/ => \1 = "", \2 = "aaa"

"Mr. Doe, John"
/^(?:Mrs?.)?.*\b(.*)$/ => \1 = "John"
/^(?:Mrs?.)?.*?\b(.*)$/ => \1 = "Doe, John"
PatrikAkerstrand
That's a really good example.
Salty
+2  A: 

say you have a regex "a\w*b", and use it on "abab" Greedy matching will match "abab" (it looks for an a, as much occurrences of \w as possible, and a b) and reluctant matching will match just "ab" (as little \w as possible)

Jorn
+3  A: 

From this link, where the tutorial author acknowledges the spirit of your question:

At first glance it may appear that the quantifiers X?, X?? and X?+ do exactly the same thing, since they all promise to match "X, once or not at all". There are subtle implementation differences which will be explained near the end of this section.

They go on to put together examples and offer the explanation:

Greedy quantifiers are considered "greedy" because they force the matcher to read in, or eat, the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from. Depending on the quantifier used in the expression, the last thing it will try matching against is 1 or 0 characters.

The reluctant quantifiers, however, take the opposite approach: They start at the beginning of the input string, then reluctantly eat one character at a time looking for a match. The last thing they try is the entire input string.

And for extra credit, the possessive explanation:

Finally, the possessive quantifiers always eat the entire input string, trying once (and only once) for a match. Unlike the greedy quantifiers, possessive quantifiers never back off, even if doing so would allow the overall match to succeed.

akf
Thanks for the added info about possessive too.
jjnguy
+2  A: 

There is documentation on how Perl handles these quantifiers perldoc perlre.

By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a "?". Note that the meanings don't change, just the "greediness":
    *?     Match 0 or more times, not greedily
    +?     Match 1 or more times, not greedily
    ??     Match 0 or 1 time, not greedily
    {n}?   Match exactly n times, not greedily
    {n,}?  Match at least n times, not greedily
    {n,m}? Match at least n but not more than m times, not greedily
By default, when a quantified subpattern does not allow the rest of the overall pattern to match, Perl will backtrack. However, this behaviour is sometimes undesirable. Thus Perl provides the "possessive" quantifier form as well.
    *+     Match 0 or more times and give nothing back
    ++     Match 1 or more times and give nothing back
    ?+     Match 0 or 1 time and give nothing back
    {n}+   Match exactly n times and give nothing back (redundant)
    {n,}+  Match at least n times and give nothing back
    {n,m}+ Match at least n but not more than m times and give nothing back
For instance,
   'aaaa' =~ /a++a/
will never match, as the a++ will gobble up all the a 's in the string and won't leave any for the remaining part of the pattern. This feature can be extremely useful to give perl hints about where it shouldn't backtrack. For instance, the typical "match a double-quoted string" problem can be most efficiently performed when written as:
   /"(?:[^"\\]++|\\.)*+"/
as we know that if the final quote does not match, backtracking will not help. See the independent subexpression (?>...) for more details; possessive quantifiers are just syntactic sugar for that construct. For instance the above example could also be written as follows:
   /"(?>(?:(?>[^"\\]+)|\\.)*)"/
Brad Gilbert