views:

52

answers:

1

I have a CSV in this below format

11000,Christopher,Nolan,MR.,Inception,25993,France,"Lefoullon,Paris",920,Director,*461-7755,33-461-7755,12175,"O'Horner, James",12300,"Glebova, Nathalie",,[email protected],Capital,NEW

http://stackoverflow.com/questions/2241758/regarding-java-split-command-parsing-csv-file

In this link @Mark Byers and @R. Bemrose suggested String[] tokens = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)", -1); But if you notice carefully in the above CSV, you will find a name with "O'Horner, James" is causing problems and its throwing ORA-0917: missing comma error. Is there a way to avoid it or the reg-ex has to be corrected?

Kinda confused :-o

+1  A: 

Caveat: all of the following is idle speculation and guesswork, as you haven't supplied any code for verification, and my palantir is in the workshop for preventative maintenance.

Train of thought: You don't get a problem with the earlier "Lefoullon,Paris" but you do get a problem with "O'Horner, James" ... this suggests that the apostrophe is probably the (innocent) cause of the problem.

Hypothesis: The field is successfully extracted from the CSV as O'Horner, James ... note that apostrophe is NOT special to CSV (and doesn't occur in that magnificent [see note] regex).

However the apostrophe is significant to SQL; apostrophes quote string literals in SQL, and apostrophes in the data must be doubled.

Like this: INSERT INTO ..... VALUES(...,'O''Horner, James', ...);

If you are using parameter substitution in your SQL interface (as you should be), converting your data fields into valid SQL constants will be done for you. Otherwise

  • write code to fix each string field (replace every occurrence of ' by '' then wrap the result in ' front and back)

  • google("SQL injection"), read, repent, and rewrite your code using parameter substitution


Note: "magnificent" as in "C'est magnifique, mais ce n'est pas la guerre". Use a CSV parser, for sanity's sake.

John Machin
Amazing!!!! Thanks John
Sandeep