




Why this expression is not following greedy approach?

string input = @"cool  man! your  dog can walk on water ";
string pattern = @"cool (?<cool>(.*))    (?<h>((dog)*)) (?(h)(?<dog>(.*))) ";

MatchCollection matches = Regex.Matches(input, pattern, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace);

foreach (Match match in matches)
    Console.WriteLine("cool=" + match.Groups["cool"].Value);
    Console.WriteLine("dog=" + match.Groups["dog"].Value);


cool=  man! your  dog can walk on water

As you can observe: (dog) group is matched 0 times.But since,* is greedy,why doesn’t it tries to find maximum matches of (dog) which is 1?

Any clues?

+7  A: 

The first .* initially matches the whole string. Then the regex engine determines whether it needs to back off to match the rest of the regex. But (?<h>((dog)*)) and (?(h)(?<dog>(.*))) can both legally match zero characters, so no backtracking is needed (as far as the .* is concerned). Try using a non-greedy .*? in that part.

EDIT (in response to the additional info posted in the answer below): Okay, replacing the first .* with a non-greedy .*? does have an effect, just not the one you want. Where everything after the word "cool" was being captured in group <cool> before, now it's being captured in group <dog>. Here's what's happening:

After the word "cool" is matched, (?<cool>(.*?)) initially matches nothing (the opposite of the greedy behavior), and (?<h>((dog)*)) tries to match. This part will always succeed no matter where it's tried, because it can match either "dog" or an empty string. That means the conditional expression in (?(h)...) will always evaluate to true, so it goes ahead and matches the rest of the input with (?<dog>(.*)).

As I understand it, you want to match everything after "cool" in named group <cool>, unless the string contains the word "dog"; then you want to capture everything after "dog" in named group <dog>. You're trying to use a conditional for that, but it's not really the right tool. Just do this:

string pattern = @"cool (?<cool>.*?) (dog (?<dog>.*))?$";

The key here is the $ at the end; it forces the non-greedy .*? to keep matching until it reaches the end of the string. Because it's non-greedy, it tries to match the next part of the regex, (dog (?<dog>.*)), before consuming each character. If the word "dog" is there, the rest of the string will be consumed by (?<dog>.*); if not, the regex still succeeds because the ? makes that whole part optional.

Alan Moore

I did tried non-greedy (.*?) but it has no effect which is obvious as non-greedy (.*?) stands for {0,1}.and since even zero characters matches here,so no effect.

Any ideas how can correct it .I mean ,i want to capture the string followed by (dog) if its present there or else the previous group will capture the string (cool(.*))

The problem is that (dog) is optional and if its present,we need the string following it.

using (dog)? doesn't have any effect as it again matches zero characters.

Thanks .

I think you've got the wrong idea about non-greedy quantifiers; read this: http://www.regular-expressions.info/repeat.html For the rest, see my edit to my original answer.
Alan Moore