I'm looking for Java implementation of CSV (comma separated values) parser with proper handling of Unicode data, e.g. UTF-8 CSV files with Chinese text. I suppose such a parser should internally use code point related methods while iterating, comparing etc. Apache 2 license or similar would work the best.
It's pretty easy to write yourself. Open the file with a FileInputStream and an InputStreamReader that uses UTF-8. Wrap it in a BufferedReader you can iterate through it using readLine(). Get each line as a String. Use regular expressions to split it into fields.
The only tricky part is constructing the regexes so they don't treat commas that are enclosed within quotes as field delimiters.
The approach above is a bit inefficient, but fast enough for most apps. If you have real performance requirements then you'll need something that iterates through characters. I wrote one a few years ago that uses a state machine that worked ok.
I don't believe in reinventing the wheel. So I do not want to write my own parser and go through the same headaches someone else did.
I personally like the CSV Parser from Ostermiller. They also have a Maven Repository if interested.
You can also check out OpenCSV. There is a Stack Overflow question already about parsing unicode.