String,split(",") isn't likely to work.
It will split fields that have embedded commas ("Foo, Inc.") even though they are a single field in the CSV line.
What if the company name is:
Company, Inc.
or worse:
Joe's "Good, Fast, and Cheap" Food
According to Wikipedia: (http://en.wikipedia.org/wiki/Comma-separated_values)
Fields with embedded commas must be enclosed within double-quote characters.
1997,Ford,E350,"Super, luxurious truck"
Fields with embedded double-quote characters must be enclosed within double-quote characters, and each of the embedded double-quote characters must be represented by a pair of double-quote characters.
1997,Ford,E350,"Super ""luxurious"" truck"
Even worse, quoted fields may have embedded line breaks (newlines; "\n"):
Fields with embedded line breaks must be enclosed within double-quote characters.
1997,Ford,E350,"Go get one now
they are going fast"
This demonstrates the problem with String,split(",") parsing commas:
The CSV line is:
a,b,c,"Company, Inc.", d, e,"Joe's ""Good, Fast, and Cheap"" Food", f, 10/11/2010,1/1/2011, g, h, i
// Test String.split(",") against CSV with
// embedded commas and embedded double-quotes in
// quoted text strings:
//
// Company names are:
// Company, Inc.
// Joe's "Good, Fast, and Cheap" Food
//
// Which should be formatted in a CSV file as:
// "Company, Inc."
// "Joe's ""Good, Fast, and Cheap"" Food"
//
//
public class TestSplit {
public static void TestSplit(String s, String splitchar) {
String[] split_s = s.split(splitchar);
for (String seg : split_s) {
System.out.println(seg);
}
}
public static void main(String[] args) {
String csvLine = "a,b,c,\"Company, Inc.\", d,"
+ " e,\"Joe's \"\"Good, Fast,"
+ " and Cheap\"\" Food\", f,"
+ " 10/11/2010,1/1/2011, h, i";
System.out.println("CSV line is:\n" + csvLine + "\n\n");
TestSplit(csvLine, ",");
}
}
Produces the following:
D:\projects\TestSplit>javac TestSplit.java
D:\projects\TestSplit>java TestSplit
CSV line is:
a,b,c,"Company, Inc.", d, e,"Joe's ""Good, Fast, and Cheap"" Food", f, 10/11/2010,1/1/2011, g, h, i
a
b
c
"Company
Inc."
d
e
"Joe's ""Good
Fast
and Cheap"" Food"
f
10/11/2010
1/1/2011
g
h
i
D:\projects\TestSplit>
Where that CSV line should be parsed as:
a
b
c
"Company, Inc."
d
e
"Joe's ""Good, Fast, and Cheap"" Food"
f
10/11/2010
1/1/2011
g
h
i