views:

857

answers:

4

I'm experimenting with internationalization by making a Hello World program that uses properties files + ResourceBundle to get different strings.

Specifically, I have a file "messages_en_US.properties" that stores "hello.world=Hello World!", which works fine of course.

I then have a file "messages_ja_JP.properties" which I've tried all sorts of things with, but it always appears as some type of garbled string when printed to the console or in Swing. The problem is obviously with the reading of the content into a Java string, as a Java string in Japanese typed directly into the source can print fine.

Things I've tried:

  • The .properties file in UTF-8 encoding with the Japanese string as-is for the value. Something I read indicates that Java expects a properties file to be in the native encoding of the system...? It didn't work either way.
  • The file in default encoding (ISO-8859-1) and the value stored as escaped Unicode created by the native2ascii program included with Java. Tried with a source file in various Japanese encodings... SHIFT-JIS, EUC-JP, ISO-2022-JP.

Edit:

I actually figured this out while I was typing this, but I figured I'd post it anyway and answer it in case it helps anyone.

+2  A: 

I realized that native2ascii was assuming (surprise) that it was converting from my operating system's default encoding each time, and as such not producing the correct escaped Unicode string.

Running native2ascii with the "-encoding encoding_name" option where encoding_name was the name of the source file's encoding (SHIFT-JIS in this case) produced the correct result and everything works fine.

Ant also has a native2ascii task that runs native2ascii on a set of input files and sends output files wherever you want, so I was able to add a builder that does that in Eclipse so that my source folder has the strings in their original encoding for easy editing and building automatically puts converted files of the same name in the output folder.

ColinD
A: 

An alternative way to handle the properties files is: http://www.unipad.org/main/

This is an editor which can read/write files in \u unicode escape format, this is the format native2ascii creates.

It don't know how well it works with Japanese, I've used it for Hungarian.

laszlot
A: 

As of JDK 1.6, Properties has a load() method that accepts a Reader. That means you can save all the property files as UTF-8 and read them all directly by passing an InputStreamReader to load(). I think that's the most elegant solution, but it requires your app to run on a Java 6 runtime.

Historically, load() only accepted an InputStream, and the stream was decoded as ISO-8859-1. Not the system default encoding, always ISO-8859-1. That's important, because it makes a certain hack possible. Say your property file is stored as UTF-8. After you retrieve a property, you can re-encode it as ISO-8859-1 and decode it again as UTF-8, like this:

String realProp = new String(prop.getBytes("ISO-8859-1"), "UTF-8");

It's ugly and fragile, but it does work. But I think the best solution, at least for the next few years, is the one you found: bulk-convert the files with native2ascii using a build tool like Ant.

Alan Moore
Hmm, only thing there is that it requires me to create functionality to mirror the ResourceBundle's factory methods' ability to get the exact file name based on the locale, rather than just giving it the base name and letting it figure out the rest, get the input stream, etc.
ColinD
A: 

Here's the solution I've used: http://www.thoughtsabout.net/blog/archives/000044.html It's bizarre that ResourceBundle doesn't support utf8.

alex