tags:

views:

3835

answers:

4

I need to write a extended version of the StringUtils.commaDelimitedListToStringArray function which gets an additional parameter: the escape char.

so calling my:

commaDelimitedListToStringArray("test,test\\,test\\,test,test", "\\")

should return:

["test", "test,test,test", "test"]



My current attempt is to use String.split() to split the String using regular expressions:

String[] array = str.split("[^\\\\],");

But the returned array is:

["tes", "test\,test\,tes", "test"]

Any ideas?

+11  A: 

Try:

String array[] = str.split("(?<!\\\\),");

Basically this is saying split on a comma, except where that comma is preceded by two backslashes. This is called a negative lookbehind zero-width assertion.

cletus
that works quite well ... thank youresult is:["test", "test\,test\,test", "test"]
arturh
Actually, it matches a comma preceded by ONE backslash. In a regex written as a Java String literal, it takes FOUR backslashes to match ONE in the target text.
Alan Moore
+5  A: 

Don't reinvent the wheel.

soulmerge
A quick read-through of the features list doesn't suggest that opencsv can do escape characters, which is a shame, because it's a much cleaner way to do DSV.
Adam Jaskiewicz
(By escape characters, I mean a character before one of the separators to say "ignore this separator". Most CSV implementations seem to use the clunky Excel method of putting everything in quotes, then using even MORE quotes to deal with quotes inside of values.)
Adam Jaskiewicz
But what if you need to escape the escape characters? Then there could be two, three, or more backslashes before any comma, and a simple split() based approach becomes impossible. The existing CSV implementations only appear clunky until you think the problem all the way through.
Alan Moore
+9  A: 

The regular expression

[^\\],

means "match a character which is not a backslash followed by a comma" - this is why patterns such as t, are matching, because t is a character which is not a backslash.

I think you need to use some sort of negative lookbehind, to capture a , which is not preceded by a \ without capturing the preceding character, something like

(?<!\\),

(BTW, note that I have purposefully not doubly-escaped the backslashes to make this more readable)

matt b
+2  A: 

For future reference, here is the complete method i ended up with:

public static String[] commaDelimitedListToStringArray(String str, String escapeChar) {
    // these characters need to be escaped in a regular expression
    String regularExpressionSpecialChars = "/.*+?|()[]{}\\";

    String escapedEscapeChar = escapeChar;

    // if the escape char for our comma separated list needs to be escaped 
    // for the regular expression, escape it using the \ char
    if(regularExpressionSpecialChars.indexOf(escapeChar) != -1) 
        escapedEscapeChar = "\\" + escapeChar;

    // see http://stackoverflow.com/questions/820172/how-to-split-a-comma-separated-string-while-ignoring-escaped-commas
    String[] temp = str.split("(?<!" + escapedEscapeChar + "),", -1);

    // remove the escapeChar for the end result
    String[] result = new String[temp.length];
    for(int i=0; i<temp.length; i++) {
        result[i] = temp[i].replaceAll(escapedEscapeChar + ",", ",");
    }

    return result;
}
arturh
Escaping doesn't need to be that difficult: String[] temp = str.split("(?<!\\Q" + escapeChar + "\\E),", -1);
Alan Moore