views:

174

answers:

1

I'm using Clojure, so this is in the context of Java regexes.

Here is an example string:

{:a "ab,cd, efg", :b "ab,def, egf,", :c "Conjecture"}

The important bits are the commas after each string. I'd like to be able to replace them with newline characters with Java's replaceAll method. A regex that will match any comma that is not surrounded by quotes will do.

If I'm not coming across well, please ask and I'll be happily to clarify anything.

edit: sorry for the confusion in the title. I haven't been awake very long.

String: {:a "ab, cd efg",} <-- In this example, the comma at the end would be matched, but the ones inside the quote would not.

String: {:a 3, :b 3,} <-- Every single comma matches.

String {:a "abcd,efg" :b "abcedg,e"} <-- Every single comma doesn't match.

+3  A: 

The regex:

,\s*(?=([^"]*"[^"]*")*[^"]*$)

Matches:

{:a "ab,cd, efg", :b "ab,def, egf,", :c "Conjecture"}
                ^                  ^
                ^                  ^

and:

{:a "ab, cd efg",}
                ^
                ^

and does not match a comma in:

{:a "abcd,efg" :b "abcedg,e"}

But when escaped quotes can appear, like so:

{:a "ab,\" cd efg",} // only the last comma should match

then a regex solution won't work.

A brief explanation of the regex:

,            # match the character ','
\s*          # match a whitespace character: [ \t\n\x0B\f\r] and repeat it zero or more times
(?=          # start positive look ahead
  (          #   start capture group 1
    [^"]*    #     match any character other than '"' and repeat it zero or more times
    "        #     match the character '"'
    [^"]*    #     match any character other than '"' and repeat it zero or more times
    "        #     match the character '"'
  )*         #   end capture group 1 and repeat it zero or more times
  [^"]*      #   match any character other than '"' and repeat it zero or more times
  $          #   match the end of the input
)            # end positive look ahead

In other words: match any comma that has zero, or an even number of quotes ahead of it (until the end of the string).

Bart Kiers
Looks like you did the opposite of what I wanted. :pI want to match the commas that /aren't/ in the string. :)
Rayne
Ah, since you did not escape the quotes inside your string, I assumed that the first and last quote were also a part of your literal. My regex is still correct, btw. See my edit.
Bart Kiers