The first .*
initially matches the whole string. Then the regex engine determines whether it needs to back off to match the rest of the regex. But (?<h>((dog)*))
and (?(h)(?<dog>(.*)))
can both legally match zero characters, so no backtracking is needed (as far as the .*
is concerned). Try using a non-greedy .*?
in that part.
EDIT (in response to the additional info posted in the answer below): Okay, replacing the first .*
with a non-greedy .*?
does have an effect, just not the one you want. Where everything after the word "cool" was being captured in group <cool>
before, now it's being captured in group <dog>
. Here's what's happening:
After the word "cool" is matched, (?<cool>(.*?))
initially matches nothing (the opposite of the greedy behavior), and (?<h>((dog)*))
tries to match. This part will always succeed no matter where it's tried, because it can match either "dog" or an empty string. That means the conditional expression in (?(h)...)
will always evaluate to true
, so it goes ahead and matches the rest of the input with (?<dog>(.*))
.
As I understand it, you want to match everything after "cool" in named group <cool>
, unless the string contains the word "dog"; then you want to capture everything after "dog" in named group <dog>
. You're trying to use a conditional for that, but it's not really the right tool. Just do this:
string pattern = @"cool (?<cool>.*?) (dog (?<dog>.*))?$";
The key here is the $
at the end; it forces the non-greedy .*?
to keep matching until it reaches the end of the string. Because it's non-greedy, it tries to match the next part of the regex, (dog (?<dog>.*))
, before consuming each character. If the word "dog" is there, the rest of the string will be consumed by (?<dog>.*)
; if not, the regex still succeeds because the ?
makes that whole part optional.