I have found the following resources on Balanced Matching for .net Regexes:
- http://weblogs.asp.net/whaggard/archive/2005/02/20/377025.aspx
- http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx
- http://msdn.microsoft.com/en-us/library/bs2twtah%28VS.85%29.aspx#BalancingGroupDefinitionExample
From what I have read in these, the following example should work:
This regex should find an "a" anywhere within an angle-bracket group, no matter how deep. It should match "<a>
", "<<a>>
", "<a<>>
", "<<>a>
", "<<><a>>
", etc.
(?<=
^
(
(
<(?<Depth>)
|
>(?<-Depth>)
)
[^<>]*?
)+?
)
(?(Depth)a|(?!))
matching on the "a" in the string "<<>a>"
While it will work for strings "<a<>>
" and "<<a>>
", I can't get it to match an "a" that is following a ">".
According to the explanations I have read, the first two "<"s should increment Depth twice, then the first ">" should decrement it once. At this point, (?(Depth)a|(?!)) should perform the "yes" option, but the regex never even makes it here.
Consider the following regex, which makes no such check and still fails to match the string in question:
(?<=
^
(
(
<(?<Depth>)
|
>(?<-Depth>)
)
[^<>]*?
)+?
)
a
Am I missing something, or is the regex engine working incorrectly?