tags:

views:

162

answers:

1

I need to extract content of unbalanced paren construction. In manual for PCRE i found solution for matching balanced parens.

<\[ ( (?>[^(<\[|\]>)]+) | (?R) )* \]>

For my test

<[<[ab<[cd]>]><[ef]>

It extracts

0.0: <[ab<[cd]>]>
0.1: <[ef]>

But i want to extract same content without outermost parens:

0.0: ab<[cd]>
0.1: ef

Could anybody point a solution?

A: 

Well, from the look of your regex, the content inside the outermost enclosure (they're not parentheses in any normal usage of the term) is already being captured in a parenthesis group. I don't know what context you're using the PCRE library in, but the extractions you want should be present in "match #1" (where the entire pattern match is match #0). i.e. your data should already look like:

0.0: <[ab<[cd]>]>
0.1: <[ef]>
1.0: ab<[cd]>
1.1: ef
chaos
Target system is PHP 5.2. And for this regexp i recieve 0.0: (<[ab<[cd]>]>) 0.1: (<[ef]>) 1.0: (<[cd]>) 1.1: (ef)
mou