The obvious answer is to use Charset.defaultCharset()
but we recently found out that this might not be the right answer. I was told that the result is different from real default charset used by java.io classes in several occasions. Looks like Java keeps 2 sets of default charset. Does anyone have any insights on this issue?
We were able to reproduce one fail case. It's kind of user error but it may still expose the root cause of all other problems. Here is the code,
public class CharSetTest {
public static void main(String[] args) {
System.out.println("Default Charset=" + Charset.defaultCharset());
System.setProperty("file.encoding", "Latin-1");
System.out.println("file.encoding=" + System.getProperty("file.encoding"));
System.out.println("Default Charset=" + Charset.defaultCharset());
System.out.println("Default Charset in Use=" + getDefaultCharSet());
}
private static String getDefaultCharSet() {
OutputStreamWriter writer = new OutputStreamWriter(new ByteArrayOutputStream());
String enc = writer.getEncoding();
return enc;
}
}
Our server requires default charset in Latin-1 to deal with some mixed encoding (ANSI/Latin-1/UTF-8) in a legacy protocol. So all our servers run with this JVM parameter,
-Dfile.encoding=ISO-8859-1
Here is the result on Java 5,
Default Charset=ISO-8859-1
file.encoding=Latin-1
Default Charset=UTF-8
Default Charset in Use=ISO8859_1
Someone tries to change the encoding runtime by setting the file.encoding in the code. We all know that doesn't work. However, this apparently throws off defaultCharset() but it doesn't affect the real default charset used by OutputStreamWriter.
Is this a bug or feature?
EDIT: The accepted answer shows the root cause of the issue. Basically, you can't trust defaultCharset() in Java 5, which is not the default encoding used by I/O classes. Looks like Java 6 corrects this issue.