tags:

views:

60

answers:

4

Hello,

How can I parse a String str = "abc, \"def,ghi\"";

such that I get the output as

String[] strs = {"abc", "\"def,ghi\""}

i.e. an array of length 2.

Should I use regular expression or Is there any method in java api or anyother opensource

project which let me do this?

Edited

To give context about the problem, I am reading a text file which has a list of records one on each line. Each record has list of fields separated by delimiter(comma or semi-colon). Now I have a requirement where I have to support text qualifier some thing excel or open office supports. Suppose I have record

abc, "def,ghi"

In this , is my delimiter and " is my text qualifier such that when I parse this string I should get two fields abc and def,ghi not {abc,def,ghi}

Hope this clears my requirement.

Thanks

Shekhar

+2  A: 

This question seems appropriate: http://stackoverflow.com/questions/6209/split-a-string-ignoring-quoted-sections

Along that line, http://opencsv.sourceforge.net/ seems appropriate.

Graphain
i think the fact that the second string has no space in it is only incidental, and not really central to the question
David Hedlund
Will work with this example but fail on `"abc, \"def, ghi\""` (just my guess, that this is a possible valid input too)
Andreas_D
@David is correct sorry space was there by mistake..So i can't rely on space..
Shekhar
right mistook question. will revise
Graphain
better! now none of our comments apply, because this is a different answer entirely. i would rather have seen the old answer as it were, deleted, and this posted as a new one. but that's just details. +1 for the answer
David Hedlund
@David Hedlund - yeah you're probably right but no matter now.
Graphain
+3  A: 

The basic algorithm is not too complicated:

 public static List<String> customSplit(String input) {
   List<String> elements = new ArrayList<String>();       
   StringBuilder elementBuilder = new StringBuilder();

   boolean isQuoted = false;
   for (char c : input.toCharArray()) {
     if (c == '\"') {
        isQuoted = !isQuoted;
        // continue;        // changed according to the OP comment - \" shall not be skipped
     }
     if (c == ',' && !isQuoted) {
        elements.add(elementBuilder.toString().trim());
        elementBuilder = new StringBuilder();
        continue;
     }
     elementBuilder.append(c); 
   }
   elements.add(elementBuilder.toString().trim()); 
   return elements;
}
Andreas_D
Would that handle nested escaped quotes?
Graphain
that's really neat! i probably would have come up with something way more complicated for this :D
David Hedlund
Not yet, but (1) I haven't seen such a requirement and (2) - it's a basic algorithm. You can easily add a 'nested quote' detection and change the 'isQuoted' test.
Andreas_D
@Graphain: there is no start-quote and end-quote, so you really can never tell whether four quotes are two quoted strings after one another, or one quoted string nested in another. the *world* doesn't support nested escaped quotes the way it does for, say `(`, `)`, where there are different signs for start and stop... unless i misunderstood your question...?
David Hedlund
@David - I mean does it support this: String a = "\"bbb\\\"\"\"ccc"; I have no idea what the OP would want displayed there but I was just pointing out that this kind of problem is almost always better solved by using an existing API. Having said that I'm *very* impressed by the succinctness of this approach.
Graphain
@David - one could introduce a grammar like `"one, \"two, \\\"three\\\"\""` to alow nested quotes, but this was not a requirement (yet)
Andreas_D
@Andreas_D: yeah, i guess that's true. something entirely different caught my eye, tho: wouldn't you need to do a second `elements.add` before you return it, to add what's currently in the builder, assuming the string doesn't end with a comma?
David Hedlund
@David - thanks!! Sure, the last add was missing - fixed the code. (will append an empty string if the input ends with ", " or so.
Andreas_D
A: 

Try this -

 String str = "abc, \"def,ghi\"";
            String regex = "([,]) | (^[\"\\w*,\\w*\"])";
            for(String s : str.split(regex)){
                System.out.println(s);
            }
It will not work for String str = "abc, \"def,ghi\",jkl";The expected output will be {abc,"def,ghi",jkl}
Shekhar
A: 

Try:

List<String> res = new LinkedList<String>();

String[] chunks = str.split("\\\"");
if (chunks.length % 2 == 0) {
    // Mismatched escaped quotes!
}
for (int i = 0; i < chunks.length; i++) {
    if (i % 2 == 1) {
        res.addAll(Array.asList(chunks[i].split(",")));
    } else {
        res.add(chunks[i]);
    }
}

This will only split up the portions that are not between escaped quotes.

Call trim() if you want to get rid of the whitespace.

Borealid