



Today one of my testers came to me and said my program had failed her test.

All she did was actually open up all my properties files and save them as Unicode format.


  1. Is there an industry practice to check every properties file encoding type?
  2. How do you deal with this problem?

I've never seen any java project running encoding check on properties file before. But I see her point, because customer might save the properties file in different encoding type.

+2  A: 

Are the properties files considered part of the application, or part of user editable files. In the first case, I don't think it's wrong to make assumptions about how parts of your application are encoded or stored.

If the properties files are targeted at the user, as user-editable files, then the principle applies: you should validate and clean any and all input coming in from outside your application.

The official java.util.Properties documentation states that the encoding is in ISO-8859-1.

When saving properties to a stream or loading them from a stream, the ISO 8859-1 character encoding is used. For characters that cannot be directly represented in this encoding, Unicode escapes are used; however, only a single 'u' character is allowed in an escape sequence. The native2ascii tool can be used to convert property files to and from other character encodings.

This can be found here.

My problem is, my user could tamper the properties file and save it in other encoding format. Do we need to run a check for this?
I think you have to assume that a user who is going to edit a properties file has the sense to save it in the right encoding. There's only so much you can do to protect users from the results of messing with things that they don't understand ...
Stephen C
I've added a bit to the answer above about user-editable files, and the intent of the properties files.

Even though the spec allows Latin-1 in properties file, the common practice is ASCII.

All other charset needs to be converted to ASCII using native2ascii to be safe.

We ran into the same issues when we started to use native encodings, some are in Latin-1 and others in UTF-8 and they are not compatible. So stay with ASCII.

ZZ Coder

As others have said, the encoding for properties files read using streams is fixed at ISO-8859-1. You can't really validate this terribly easily - although checking whether the file starts with the UTF-8 byte order mark wouldn't be a bad idea.

As of Java 6, however you can provide a Reader to Properties.load instead of a Stream. If it's still an option, you might want to start using that and mandate UTF-8, which is going to be a heck of a lot easier for many people to use than ISO-8859-1 and the \uxxxx escaping.

Jon Skeet

Use native2ascii java utility to have your property files in proper state.
