tags:

views:

1310

answers:

6

I have this string:

1001,"Fitzsimmons, Des Marteau, Beale and Nunn",109,"George","COD","Standard",,109,8/14/1998 8:50:02

What regular expression would I use to replace the commas in the "Fitzsimmons, Des Marteau, Beale and Nunn" with a pipe | so it is:

"Fitzsimmons| Des Marteau| Beale and Nunn"

Should have clarified, I am doing a split on this string using the commas, so I want "Fitzsimmons, Des Marteau, Beale and Nunn" to be a string. I plan to replace the | with a comma after I have split it.

+4  A: 

While it would be possible to do with regular expressions, it would be much clearer to first split the line into fields, then do the replacement. There is a good (free) java library for parsing CSV files called opencsv.

jheddings
+1  A: 

I believe this is going to be very difficult to do with a regular expression. The trouble is that the regular expression would have to count quotes to determine if it's inside two quotes or not.

Actually, the .NET regex engine could do it with its balanced matching feature. But I don't think Java has that feature and I can't think of a reliable way to do it without it.

You may have to write some procedural code to accomplish this.

Steve Wortham
+3  A: 

I have tried to use StringTokenizer but it didn't work well, so here is a code which seems to do what you want:

import java.util.*;

public class JTest
{
    public static void main(String[] args)
    {
    String str = "1001,\"Fitzsimmons, Des Marteau, Beale and Nunn\",109,\"George\",\"COD\",\"Standard\",,109,8/14/1998 8:50:02";
    String copy = new String();

    boolean inQuotes = false;

    for(int i=0; i<str.length(); ++i)
        {
     if (str.charAt(i)=='"')
         inQuotes = !inQuotes;
     if (str.charAt(i)==',' && inQuotes)
         copy += '|';
     else
         copy += str.charAt(i);
        }

    System.out.println(str);
    System.out.println(copy);
    }
}
Aif
+1  A: 

Well, this is a CSV file, so I'd use Ruby's built-in CSV library. Then you don't have to figure out how to deal with escaped quotation marks, for example.

require 'csv'
string =<<CSV
1001,"Fitzsimmons, Des Marteau, Beale and Nunn",109,"George","COD","Standard",,109,8/14/1998 8:50:02
CSV
csv=CSV.parse string
csv.each{|row| row.each {|cell| cell.gsub!(",","|") if cell.is_a?(String)}}
outstring = ""
CSV::Writer.generate(outstring){|out| csv.each {|row| out<<row}}
Ken Bloom
The OP tagged this question with Java, so I'm assuming that's the language of choice.
jheddings
this is not a ruby question.
TokenMacGuy
So perhaps try http://sourceforge.net/projects/javacsv/ in Java.
Ken Bloom
+1  A: 

Here's a bit of Python that seems to do the trick:

>>> import re
>>> p = re.compile('["][^"]*["]|[^,]*')
>>> x = """1001,"Fitzsimmons, Des Marteau, Beale and Nunn",109,"George","COD","Standard",,109,8/14/1998 8:50:02"""
>>> y = p.findall(x)
>>> ','.join(z.replace(',','|') for z in y if z)
'1001,"Fitzsimmons| Des Marteau| Beale and Nunn",109,"George","COD","Standard",109,8/14/1998 8:50:02'

Seems like this code turn into a code golf question :-)

Oops...missed the Java tag.

Joel
+1  A: 

Hey Brandon you can easily do this with RE by using look behind and look ahead. see the code below

String cvsString = "1001,\"Fitzsimmons, Des Marteau, Beale and Nunn\",109,\"George\",\"COD\",\"Standard\",,109,8/14/1998 8:50:02";  
String rePattern = "(?<=\")([^\"]+?),([^\"]+?)(?=\")";  
// first replace  
String oldString = cvsString;  
String resultString = cvsString.replaceAll(rePattern, "$1|$2");  
// additional repalces until until no more changes  
while (!resultString.equalsIgnoreCase(oldString)){  
    oldString = resultString;  
    resultString = resultString.replaceAll(rePattern, "$1|$2");  
}

result string will be 1001,"Fitzsimmons| Des Marteau| Beale and Nunn",109,"George","COD","Standard",,109,8/14/1998 8:50:02

NingZhang.info

Ning120
That's an interesting approach. I don't see why you need the lookahead and lookbehind, though. Wouldn't this regex work just as well? `("[^",]++),([^"]++")`
Alan Moore
Well the look ahead/behind is required so that in any token can contain comma ',' as long as it is double quoted. Otherwise the tokens will not be correctly parsed.
Ning120