tags:

views:

84

answers:

4

I'm trying to split a string on a , where that character is not contained in ().

Example String:

`table1`.`lname`,`table1`.`fname`,if(foo is not null,foo,if(bar is not null,bar,table3.baz)),`table3`.`shu`

I want to split it into an array looking like

(
  0=>`table1`.`lname`
  1=>`table1`.`fname`
  2=>if(foo is not null,foo,if(bar is not null,bar,table3.baz))
  3=>`table3`.`shu`
)

Any ideas on how to tackle this problem?

-- Dave

+4  A: 

In general, you can't do it with a regex. You typically need a recursive descendant parser (or something similar) to match up parentheses which may be nested to arbitrary depth.

I think there have been similar questions here before, but I was having a hard time finding them. This answer however should help to explain.

Adam Bellaire
A: 

I'd look into your favorite language to see if there is a specific module for handling CSV files. Ruby has CSV (replaced by FasterCSV in recent versions) that would handle your problem just fine.

It is more complicated than a single regex but will get the job done.

Perl has this Parse::CSV module.

Keltia
A: 

Regex isn't very good at this. Consider the following snippet:

(a)b(c(d)e)

Where each letter represents a comma (your search target). Based on your question, you would only want to match comma b. The trick is that expressions are generally either greedy or not greedy, with no middle ground.

A greedy expression would see the ( at the very beginning of the segment and the ) at the very end and take everything inside them, regardless that there are closing parentheses elsewhere. Nothing would be matched.

An ungreedy expression would take only the smallest set possible, starting from the beginning. It would match comma b, but also see this segment as one unit: (c(d) and then proceed to also match comma e.

There are some engines that allow you to count nesting levels, but the expressions are usually ugly and hard to maintain: best to just avoid the feature unless you really understand it well.

Joel Coehoorn
A: 

If you know that you are only going to receive one pair of parenthesis, then this might work:

/(([^,]*\(.*\))|[^,]*)/g

Just remember that this will fail if you have a ) somewhere miscellaneous or if you have more than one set of parenthesis that need to be parsed out.

Mike
The example he uses has two pairs of parenthesis.
Chris Lutz