views:

182

answers:

1

PCRE has a feature called recursive pattern, which can be used to match nested subgroups. For example, consider the "grammar"

Q -> \w | '[' A ';' Q* ','? Q* ']' | '<' A '>'
A -> (Q | ',')*
// to match ^A$.

It can be done in PCRE with the pattern

^((?:,|(\w|\[(?1);(?2)*,?(?2)*\]|<(?1)>))*)$

(Example test case: http://www.ideone.com/L4lHE)

There is no recursive pattern in .NET. Instead, it provides balancing groups for stack-based manipulation for matching simple nested patterns.

Is it possible to convert the above PCRE pattern into .NET Regex style?

(Yes I know it's better not to use regex in for this. It's just a theoretical question.)

References

+5  A: 

The answer is (probably) Yes.

The technique is much more complex than the (?1) recursive call, but the result is almost 1-to-1 with the rules of the grammar - I worked in a such methodical way I can easily see it scripted. Basically, you match block-by-block, and use the stack to keep track of where you are. This is an almost working solution:

^(?:
    (\w(?<Q>)) # Q1
    |
    (<(?<Angle>))  #Q2 - start <
    |
    (\>(?<-Angle>)(?<-A>)?(?<Q>))  #Q2 - end >, match Q
    |
    (\[(?<Block>))  # Q3 start - [
    |
    (;(?<Semi-Block>)(?<-A>)?)  #Q3 - ; after [
    |
    (\](?<-Semi>)(?<-Q>)*(?<Q>))  #Q3 ] after ;, match Q
    |
    ((,|(?<-Q>))*(?<A>))   #Match an A group
)*$
# Post Conditions
(?(Angle)(?!))
(?(Block)(?!))
(?(Semi)(?!))

It is missing the part of allowing commas in Q->[A;Q*,?Q*], and for some reason allows [A;A], so it matches [;,,] and [abc;d,e,f]. Rest of the strings match the same as the test cases.
Another minor point is an issue with pushing to the stack with an empty capture - it doesn't. A accepts Ø, so I had to use (?<-A>)? to check if it captured.

The whole regex should look like this, but again, it is useless with the bug there.

Kobi
As a side note, I'm surprised I found so little material on the subject. I learned as I go, an it took a great deal of time. Anyway, if anyone finds the flaw in my code I'll be happy to hear.
Kobi