tags:

views:

842

answers:

2

Hi Folks - Im not quite sure if this is possible, so I turn to you.

I would like to find a regex that will pick out all commas that fall outside quotesets.

For example:

'foo' => 'bar',
'foofoo' => 'bar,bar'

this would pick out the single comma on line 1, after "'bar',"

I don't really care about single vs double quotes.

Has anyone got any thoughts? I feel like this should be possible with readaheads, but my regex fu is too weak.

+5  A: 

This will match any string up to and including the first non-quoted ",". Is that what you are wanting?

/^([^"]|"[^"]*")*?(,)/

If you want all of them (and as a counter-example to the guy who said it wasn't possible) you could write:

/(,)(?=(?:[^"]|"[^"]*")*$)/

which will match all of them. Thus

'test, a "comma,", bob, ",sam,",here'.gsub(/(,)(?=(?:[^"]|"[^"]*")*$)/,';')

replaces all the commas not inside quotes with semicolons, and produces:

'test; a "comma,"; bob; ",sam,";here'

If you need it to work across line breaks just add the m (multiline) flag.

MarkusQ
This looks like it works properly - with double quotes. (,)(?=(?:[^"']|["|'][^"']*")*$)I believe works with single quote OR double quotes. Thanks!
SocialCensus
I wanted to point out that this does not work across line breaks.
SocialCensus
@SocialCensus Then use the m flag. Also, your example in the comment above has several bugs. For example, it takes double quotes, single quotes, and vertical bars as opening quotes but only takes double quotes as closing quotes.
MarkusQ
MarkusQ - You are quite correct, and I surrender my regex license. Yours works perfectly. Mine, not so much.
SocialCensus
@SocialCensus Don't surrender, fight harder!
MarkusQ
+1  A: 

Try this regular expression:

(?:"(?:[^\\"]+|\\(?:\\\\)*[\\"])*"|'(?:[^\\']+|\\(?:\\\\)*[\\'])*')\s*=>\s*(?:"(?:[^\\"]+|\\(?:\\\\)*[\\"])*"|'(?:[^\\']+|\\(?:\\\\)*[\\'])*')\s*,

This does also allow strings like “'foo\'bar' => 'bar\\',”.

Gumbo
This one doesnt seem to work for me...
SocialCensus