I have developed my Java/EE program in Windows machine and everything worked perfectly in Windows, but when I installed my WAR to Jboss in Linux machine I have encoding issues with MySQL when I import csv-files. Csv files are encoded as ISO-8859-1 and file I import is encoded as ISO-8859-1. MySQL doesn't seem to get Strings encoded as UTF-8, what is encoding used in database.
I'm afraid there isn't much information for us to go one but as a starter for ten I'd recomend reading Joel's article on unicode and charsets:
read it at least twice :)
for your particular issue the chances are that the Reader that is opening the csv file is assuming the platform encoding (which is likely to be UTF8) this means that the ISO-8859-1 document is going to be incorrectly converted to Java String encoding, it all goes wrong from here.
An important point with charater encoding in any Java application is to understand that any String is in 'Java String encoding' which I think is UTF-16E but I can't remember of the top of my head, thus there is no such thing as a UTF-8 or ISO-8859-1 String in java. This means that you have to look at the boundaries of the system where the String is read in from a series of bytes and where it is exported. Since you are using the mysql jdbc driver, I can't imagine that is doesn't handle char encoding correctly for the target db, but it if all else fails it maybe worth checking out the driver documentation.
If you just want to find out the character set used by the database check this page: http://dev.mysql.com/doc/refman/5.0/en/charset-database.html
If you want to change the encoding used by the mysql command line client use the --default_character_set
option
If the problems happen when you read the files (as opposed to when you insert the data into the DB) I'd guess it's the file.encoding
system property that's off. If you create a reader without specifying an encoding, it uses file.encoding
as the default. So if your Linux box has, say, UTF-8 as its system encoding, non-ASCII-7 characters will cause trouble.
You can alter the system's default encoding globally by setting the LC_ALL
environment variable to some appropriate value (I think you can use something like en_US.ISO-8859-1
, but check the manual), or you can just change it locally for the JVM instance by specifying it on the command line:
java -Dfile.encoding=ISO-8859-1 -jar yourapp.jar
If you can change the code which reads the CSVs, I would assume you read (directly or indirectly) from an InputStreamReader
- just provide the constructor with the correct CharSet
/CharSetDecoder
and you're done. (You might also want to make this stuff configurable - but I assume you can work that out by yourself.)
You can use GNU iconv to change your csv file's encoding
Hope this will help you.