ansaurus

Question

How ensure if java program uses UTF-8 encoding

Answer 1

+4 A:

System.getProperty("file.encoding")

returns the VM encoding for i/o operations

You can set it by passing -Dfile.encoding=utf-8

Bozho 2010-06-07 16:27:11

Please see the thread that i mentioned in the comment. The above property is internal implementation detail for specific JVM implementation. The use of this property is varying in Java 1.5 and 1.6.

Nayn 2010-06-07 16:28:35

it isn't. Read the accepted answer fully :) this is a standard setting that determines the default charset.

Bozho 2010-06-07 16:32:00

Setting a property like this to correct code is an outrageous hack.

Tom Hawtin - tackline 2010-06-07 17:07:06

@Tom I don't share your opinion on that. While it is preferable not to rely on this (and I never do), it is legitimate to use VM parameters.

Bozho 2010-06-07 17:10:05

I have to admit that I couldn't solve this problem without setting system property as -Dfile.encoding=utf-8. I tried every possible approach to put encoding wherever possible.

Nayn 2010-06-07 20:29:51

Answer 2

+3 A:

Not a direct answer, but to ease the job it's good to know that in a bit decent IDE you can just search for used occurrences of InputStreamReader, OutputStreamWriter, String#getBytes(), String(byte[]), Properties#load(), URLEncoder#encode(), URLDecoder#decode() and consorts wherein you could pass the charset and then update accordingly. You'd also like to search for FileReader and FileWriter and replace them by the first two mentioned classes. True, it's a tedious task, but worth it and I'd prefer it above relying on enrivonmental specifics.

In Eclipse for example, select the project(s) of interest, hit Ctrl+H, switch to tab Java Search, enter for example InputStreamReader, tick the Search For option Constructor, choose Sources as the only Search In option, and execute the search.

BalusC 2010-06-07 16:27:53

+1 good to mention the `InputStreamReader` and the likes.

Bozho 2010-06-07 16:37:50

`FileReader` is the baddy. I don't know of a comprehensive list of these dangerous API methods/constructors.

Tom Hawtin - tackline 2010-06-07 17:08:07

Answer 3

A:

relying on default encoding of JVM causes bugs

Indeed, one should always specify the charset when encoding/decoding.

If you are satisfied a default global charset for all you encoding/decoding (not always enough), you can live with Bozho's answer : specify a known fixed default in your JVM arguments or in some static initializer.

But it's good practice to search all implicit charset specifications in your code, and replace them with a explicit charset encoding: some typical methods/classes to look at: FileWriter, FileReader, InputStreamReader, OutputStreamWriter, String#getBytes(), String(byte[]).

leonbloy 2010-06-07 16:32:37

Noted should be that `FileWriter` and `FileReader` can't be changed to take a specified encoding. They should be replaced with `OutputStreamWriter` and `InputStreamReader` respectively.

BalusC 2010-06-07 16:34:19

Answer 4

A:

If the file is manipulated by native tools on the servers may want to set the encoding to System.getProperty("file.encoding"). I have run into bugs both ways.

Best practice is to know which character set is used, and set that. Also if the file is used to interface to another application, you should define the character set used. This may be a windows code page or a different UTF format.

BillThor 2010-06-07 16:43:20

ansaurus

tags:

views:

answers:

How ensure if java program uses UTF-8 encoding

related questions