tags:

views:

260

answers:

3

Hi,

Using some extensions provided by .net, one can find groups of parenthesis by using something like this:

^(\w+)\(((?>[^()]+|\((?<D>)|\)(?<-D>))*(?(D)(?!)))\)(.*)$

This will match the following:

Func(innerfunction(arg)).DoSomething()

With the following groups:

  • Group 1: Func
  • Group 2: innerfunction(arg)
  • Group 3: .DoSomething()

My question is, how do I match commas, taking into account if they are or not inside a parenthesis group? For example, a regex to evaluate:

Func(innerFunction(arg1, arg2), arg3).DoSomething()

Should yield:

  • Group 1: Func
  • Group 2: innerFunction(arg1, arg2)
  • Group 3: arg3
  • Group 4: .DoSomething()

Thanks.

A: 

I think I found it. Does anyone have a counter example:

^([^()]*?|.*\((?>[^()]+|\((?<D>)|\)(?<-D>))*(?(D)(?!))\).*?),(.*)$

This will match this expression:

func1(arg2, func3(arg3, arg4)), func2(arg5, arg6).property

as:

  • Group1: func1(arg2, func3(arg3, arg4))
  • Group2: func2(arg5, arg6).property

This solution only looks for one comma, but it deals with an arbitrary depth of parenthesis.

UPDATE: Gumbo has provided a counter-example:

func1((arg1), arg2), func2(arg3).property

Get’s split up into:

  • Group1: func1((arg1)
  • Group2: arg2), func2(arg3).property

HOWEVER: By turning the first "any match" into non-greedy, one can solve it:

^([^()]*?|.*?\((?>[^()]+|\((?<D>)|\)(?<-D>))*(?(D)(?!))\).*?)\s*,\s*(.+)$

Any other counter-example?

Hugo S Ferreira
Here’s an example of what don’t match: “func1((arg1), arg2), func2(arg3).property” get’s split up into “func1((arg1)” and “arg2), func2(arg3).property” (accodring to <http://regexlib.com/RETester.aspx>).
Gumbo
What about now? ;-)
Hugo S Ferreira
What about something like this: “func1(")", arg1), func2().property”?
Gumbo
Treating " was not in the scope of the question; there's no meaning to it. Nonetheless, one could treat them the sasme way as ()s.
Hugo S Ferreira
A: 

While it's not impossible, I advise against using regular expressions for this.

The problem is that expressions tend to be either completely greedy or completely un-greedy. For example, take the following input:

(a,)b,(c,(d,)e,)

A greedy expression would match as much as possible. It will see everything as inside the parentheses, and therefore return nothing.

An ungreedy expression would correctly match comma b, but it would also match comma e, because it would see (c,(d,) as one complete group.

Now, it sounds like you already understand those issues, and that the .Net regular expression engine does have a feature that will allow you to get past this to some extent. But the result expression will be ugly, unmaintainable, not very portable, and easy to get wrong. Unless you really know what you're doing it's probably best to look for another solution.

Joel Coehoorn
A: 

If you don’t want to limit the nesting depth, it is impossible with regular expressions alone.

So I recommend you to build a parser that breaks down the nesting levels. Read the input character by character. When it’s a “(” increase the level, when it’s a “)” decrease the level, when it’s a “,” split up.

Gumbo
Thanks for the advice. Can you take a look at my solution and see if there's any counter-example?
Hugo S Ferreira
+1. Regex can't deal with arbitrary-depth nesting, some other kind of parser is needed.
bobince
I continue to assert that the presented solution solves the problem. And arbitrary-depth nesting is solved by the <D>, and <D> counting strategy :) Any counter-example, please?
Hugo S Ferreira