As a generically brewed example for the purpose of this question, my intent is to match some number of a
's, then an equal number of b
's, plus one more b
.
Examine the two patterns exhibited in this snippet (also on ideone.com):
var r1 = new Regex(@"(?xn)
(?<A> a)+ (?<B-A> b)+ (?(A)(?!)) b
");
var r2 = new Regex(@"(?xn)
(?<A> a)+ (?<B-A> b)+? (?(A)(?!)) b
");
Console.WriteLine(r1.Match("aaabbb"));
// aaabbb
Console.WriteLine(r2.Match("aaabbb"));
// aabbb
Note that there is a difference in the matches of the two patterns. r1
, which uses a greedy repetition on the balancing group construct, matches 3 a
's and 3 b
's, which is NOT as intended. r2
, which uses a reluctant repetition, gives me 2 a
's and 3 b
's, which IS as intended.
The only way I can explain this is that when (?<B-A> b)+
backtracks to match one less b
, it pops from the B
stack but DOES NOT push back what was correspondingly popped from the A
stack. Thus, even though one less b
is now matched due to backtracking, the A
stack remains empty. This is the only way I can explain how r1
can match aaabbb
.
Note that using reluctant +?
in r2
doesn't cause this problem. The way I see it, this is because unlike greedy repetition, a reluctant repetition doesn't have to "undo the damage" to the A
stack, so-to-speak. By contrast, the greedy repetition causes as much "damage" as possible, but the backtracking fails to "leave things as they were" to the A
stack.
Is this a correct analysis of what happened? And if so, is this behavior by design? Because what it basically looks like to me is that backtracking a balancing group in a greedy repetition may cause imbalance, and thus this could potentially be categorized as a bug (or at the very least a somewhat astonishing behavior that is inadequately documented).