views:

706

answers:

3

I have a csv file in the below format. I get an issue if either one of the beow csv data is read by the program

"D",abc"def,"","0429"292"0","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""


"D","abc"def","","04292920","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""

The below split command is used to ignore the commas inside the double quotes i got the below split command from an earlier post. Pasted the URL that i took this command

String items[] = line.split(",(?=([^\"]\"[^\"]\")[^\"]$)",15); System.out.println("items.length"+items.length);

http://stackoverflow.com/questions/2241758/regarding-java-split-command-parsing-csv-file

The items.length is printed as 14 instead of 15. The abc"def is not recognized as a individual field and it's getting incorrectly stored as "D",abc"def in items[0]. . I want it to be stored in the below way

items[0] should be "D" and items[1] should be abc"def

The same issue happens when there is a value "abc"def". I want it to be stored as

items[0] should be "D" and items[1] should be "abc"def"

Also this split command works perfectly if the double quotes repeated inside the double quotes( field value is D,"abc""def",1 ).

How can i resolve this issue.

A: 

If possible, changing your CSV format would make the solution very simple.

See the following for an overview of Delimiter Separated Values, a common format on Unix-based systems:

http://www.faqs.org/docs/artu/ch05s02.html#id2901882

Ray Muirhead
Thanks a lot.I am planning to modify the file format that each field should be enclosed in by double quotes mandatory "A","Field1","Field2","Field3","Fi"el,d","Fi""eld",I want the separator combined i.e to be ", (double quotes followed by ,) how do i change the below split command to include two separator ", (double quote and comma) together line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)",15);
Arav
+2  A: 

I think you would be much better off writing a parser to parse the CSV files rather than try to use a regular expression. Once you start dealing with CSV files with carriage returns within the lines, then the Regex will probably fall apart. It wouldn't take that much code to write a simple while loop that went through all the characters and split up the data. It would be lot easier to deal with "Non-Standard"* CSV files such as yours when you have a parser rather than a Regex.

*I say non-standard because there isn't really an official standard for CSV, and when you're dealing with CSV files from many different systems, you see lots of weird things, like the abc"def field as shown above.

Kibbee
Thanks a lot.I am planning to modify the file format that each field should be enclosed in by double quotes mandatory "A","Field1","Field2","Field3","Fi"el,d","Fi""eld",I want the separator combined i.e to be ", (double quotes followed by ,) how do i change the below split command to include two separator ", (double quote and comma) together line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)",15);
Arav
A: 

opencsv is a great simple and light weight CSV parser for Java. It will easily handle your data.

Jeremy Raymond