views:

243

answers:

3

I'm writing an ANT task in Java.

In my build.xml I specify parameters, which should be read from my java class. Problems occur, when I use special characters, like german umlauts (Ö,Ä,Ü) in these parameters. In my java task they appear as ?-characters (using System.out.print from within eclipse).

All my files are encoded as UTF-8. and my build.xml has the corresponding declaration:

<?xml version="1.0" encoding="UTF-8" ?>

For the details of writing the task: I do it according to http://ant.apache.org/manual/develop.html (especially Point 5 nested elements). I have nested elements in my task like:

<parameter name="test"   value="ÖÄÜtest"/>

and a java method:

public void addConfiguredParameter(Parameter prop) {
    System.out.println(prop.getValue());
    //prints ???test
}

to read the parameter values.

A: 

Have you tried starting java with the following parameter?

-Dfile.encoding=UTF-8
Rodrigo
Actually I don't have any run configuration for which I could specify the encoding. I just execute my ant script from within eclipse, which calls my java task!
räph
Java does not support configuration of default transcoding operations from the command line. From Sun's bug database: _The "file.encoding" property is not required by the J2SE platform specification; it's an internal detail of Sun's implementations and should not be examined or modified by user code. It's also intended to be read-only; it's technically impossible to support the setting of this property to arbitrary values on the command line or at any other time during program execution._ http://bugs.sun.com/view_bug.do?bug_id=4163515
McDowell
+1  A: 

There are several transcoding operations going on here:

  1. Saving the XML as UTF-8 by your editor
    • Check that the characters are encoded correctly using a hex editor
  2. The parsing of the XML by Ant from UTF-8 to UTF-16 strings
    • A fault here is very unlikely
  3. Transcoding by the System.out PrintStream from UTF-16 strings to the platform encoding
    • Check that the encoding used supports the characters
  4. Decoding of the bytes received by the Eclipse console into UTF-16 strings for display
    • Check that the encoding used by the console matches that of the PrintStream

Encoded as UTF-8, you would expect the following encoded values in your XML file:

Grapheme  UTF-8 encoded bytes
Ö         c3 96
Ä         c3 84
Ü         c3 9c
McDowell
A: 

The problem somehow vanished into thin air and was probably already fixed by switching everything to utf-8, but maybe eclipse didn't react so fast. Anyway I couldn't reproduce the error.

A problem which remained was, that when I referred to a build.properties file (which uses the characters mentioned) from my build.xml - then my java task still didn't get the characters right. But I could circumvent this by using \u and the hex representation of the letters - although that's not really convenient!

räph
Properties files (so long as they aren't XML) are restricted to `ISO 8859-1`, so you must use Unicode escape sequences for characters not in this range.
McDowell