views:

101

answers:

0

As a generically brewed example for the purpose of this question, my intent is to match some number of a's, then an equal number of b's, plus one more b.

Examine the two patterns exhibited in this snippet (also on ideone.com):

var r1 = new Regex(@"(?xn)
   (?<A> a)+   (?<B-A> b)+    (?(A)(?!))   b
");
var r2 = new Regex(@"(?xn)
   (?<A> a)+   (?<B-A> b)+?   (?(A)(?!))   b
");

Console.WriteLine(r1.Match("aaabbb"));
// aaabbb

Console.WriteLine(r2.Match("aaabbb"));
// aabbb

Note that there is a difference in the matches of the two patterns. r1, which uses a greedy repetition on the balancing group construct, matches 3 a's and 3 b's, which is NOT as intended. r2, which uses a reluctant repetition, gives me 2 a's and 3 b's, which IS as intended.

The only way I can explain this is that when (?<B-A> b)+ backtracks to match one less b, it pops from the B stack but DOES NOT push back what was correspondingly popped from the A stack. Thus, even though one less b is now matched due to backtracking, the A stack remains empty. This is the only way I can explain how r1 can match aaabbb.

Note that using reluctant +? in r2 doesn't cause this problem. The way I see it, this is because unlike greedy repetition, a reluctant repetition doesn't have to "undo the damage" to the A stack, so-to-speak. By contrast, the greedy repetition causes as much "damage" as possible, but the backtracking fails to "leave things as they were" to the A stack.

Is this a correct analysis of what happened? And if so, is this behavior by design? Because what it basically looks like to me is that backtracking a balancing group in a greedy repetition may cause imbalance, and thus this could potentially be categorized as a bug (or at the very least a somewhat astonishing behavior that is inadequately documented).