tags:

views:

1649

answers:

9

Hi,

I need to remove commas within a String only when enclosed by quotes.

example:

String a = "123, \"Anders, Jr.\", John, [email protected],A"

after replacement should be

String a = "123, Anders Jr., John, [email protected],A"

Can you please give me sample java code to do this?

Thanks much,

Lina

+1  A: 

Should work:

s/(?<="[^"]*),(?=[^"]*")//g
s/"//g
strager
Java doesn't support unbounded lookbehinds. You would have to replace the asterisk with something like {0,<n>} where <n> is either an arbitrarily large number, or (my preference) the length of the original string.
Alan Moore
+2  A: 

It also seems you need to remove the quotes, judging by your example.

You can't do that in a single regexp. You would need to match over each instance of

"[^"]*"

then strip the surrounding quotes and replace the commas. Are there any other characters which are troublesome? Can quote characters be escaped inside quotes, eg. as ‘""’?

It looks like you are trying to parse CSV. If so, regex is insufficient for the task and you should look at one of the many free Java CSV parsers.

bobince
I have this working in perl: $row =~ m#(.*)"(.*),(.*)"(.*)#; $row = $1.$2.$3.$4;But not familiar with the syntax to do something like this in Java... sample code would be much appreciated!Yes I am dealing with a csv - I will look at using a Java CSV parsers as well.
That expression — and Lieven's answer — only matches a single occurrence of a comma in "s. It won't remove multiple commas, and won't remove quotes where there is no comma.
bobince
My answer removes one commas with every loop. You are right it doesn't remove quotes where there is no comma but that wasn't requested.
Lieven
+1  A: 

This looks like a line from a CSV file, parsing it through any reasonable CSV library would automatically deal with this issue for you. At least by reading the quoted value into a single 'field'.

Lazarus
A: 

Probably grossly inefficiënt but it seems to work.

import java.util.regex.*;

StringBuffer ResultString = new StringBuffer();

try {
    Pattern regex = Pattern.compile("(.*)\"(.*),(.*)\"(.*)", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
    Matcher regexMatcher = regex.matcher(a);
    while (regexMatcher.find()) {
     try {
      // You can vary the replacement text for each match on-the-fly
      regexMatcher.appendReplacement(ResultString, "$1$2$3$4");
     } catch (IllegalStateException ex) {
      // appendReplacement() called without a prior successful call to find()
     } catch (IllegalArgumentException ex) {
      // Syntax error in the replacement text (unescaped $ signs?)
     } catch (IndexOutOfBoundsException ex) {
      // Non-existent backreference used the replacement text
     } 
    }
    regexMatcher.appendTail(ResultString);
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}
Lieven
Thanks much! This worked perfectly! (I had to remove the extra back slashes though so that in the pattern there is just one slash before each quote).
Yeah, " is not a special character in regex, it only needs escaping once for Java.
bobince
Thanks, I didn't know that. Ill adjust the answer.
Lieven
You don't need the CASE_INSENSITIVE or UNICODE_CASE modifiers, either; there are no letters in the regex.
Alan Moore
+2  A: 
Yorch
+2  A: 

There are two major problems with the accepted answer. First, the regex "(.*)\"(.*),(.*)\"(.*)" will match the whole string if it matches anything, so it will remove at most one comma and two quotation marks.

Second, there's nothing to ensure that the comma and quotes will all be part of the same field; given the input ("foo", "bar") it will return ("foo "bar). It also doesn't account for newlines or escaped quotation marks, both of which are permitted in quoted fields.

You can use regexes to parse CSV data, but it's much trickier than most people expect. But why bother fighting with it when, as bobince pointed out, there are several free CSV libraries out there for the downloading?

Alan Moore
A: 

This works fine. '<' instead of '>'

boolean deleteCommas = false;
for(int i=0; i < text.length(); i++){
    if(text.charAt(i)=='\''){
        text = text.substring(0, i) + text.substring(i+1, text.length());
        deleteCommas = !deleteCommas;
    }
    if(text.charAt(i)==','&&deleteCommas){
        text = text.substring(0, i) + text.substring(i+1, text.length());
    }
}
Try that with the string "'test,test'" (since you also changed the double-quote to a single-quote). If the last character in the string is a quote, you remove it in the first "if" block, then try to test it again in the second "if" block. Rule of thumb: never try to alter a string while you're iterating through it; use a StringBuilder to construct a new string instead.
Alan Moore
A: 

A simpler approach would be replacing the matches of this regular expression:

("[^",]+),([^"]+")

By this:

$1$2
Gumbo
A: 

The following perl works for most cases:

open(DATA,'in/my.csv');
while(<DATA>){
  if(/(,\s*|^)"[^"]*,[^"]*"(\s*,|$)/){
    print "Before: $_";
    while(/(,\s*|^)"[^"]*,[^"]*"(\s*,|$)/){
      s/((?:^|,\s*)"[^"]*),([^"]*"(?:\s*,|$))/$1 $2/
    }
    print "After: $_";
  }
}

It's looking for:

  • (comma plus optional spaces) or start of line
  • a quote
  • 0 or more non-quotes
  • a comma
  • 0 or more non-quotes
  • (optional spaces plus comma) or end of line

If found, it will then keep replacing the comma with a space until it can find no more examples.

It works because of an assumption that the opening quote will be preceded by a comma plus optional spaces (or will be at the start of the line), and the closing quote will be followed by optional spaces plus a comma, or will be the end of the line.

I'm sure there are cases where it will fail - if anyone can post 'em, I'd be keen to see them...