views:

1222

answers:

2

The last week I've been trying to figure out why some stream decoding my newly adopted application is doing was giving me some major encoding problems. Finally I figured out that the problem was that the JARs/WAR being built with Ant and deployed to the server were being compiled with the javac task using the encoding UTF-8 instead of the system default of CP1252.

This seems to be caused mainly by having many hard coded strings/chars for these special characters.

This was easily resolved by either of the following steps:

  • changing the encoding for the eclipse project to be UTF-8 to match the byte code on the server
  • setting the encoding for the javac task to be CP1252 to build the WAR file to match the client byte code
  • and strangely enough just running Ant from the command prompt without designating any encoding.

So why is Ant in Eclipse changing to UTF-8? Is this configurable? Where do I configure it?

System

  • Windows XP
  • Eclipse 3.5
  • Ant 1.7.1
  • Java 1.6.0_11
+2  A: 

Ant, run from Eclipse, using all the same versions (except I have Java 1.6.0_15) treats my Java source files as Windows-1252. My workspace and projects are using the default settings.

UTF-8 to match the byte code on the server

I'm not sure what you mean by this - you mean the encoding of the source files, surely. The bytecode is a structured set of instructions; string literals built into the class files are always UTF-8.


I would use Unicode escape sequences to make my files more encoding-agnostic. You can use tools like native2ascii or the java.nio.charset API to help with this.

McDowell
Just add command <native2ascii src="path" dest="path" includes="pattern/>to your ant task and it will convert your sourcefiles charset properly to UTF-8.
Vanger
by that I meant that if I switched the project encoding to utf8 it would be in line with the utf8 compiled code that was running on the server, since Ant in Eclipse is somehow using utf8 for javac
codeLes
I'd rather everything be set to a more cross-platform-friendly encoding such as utf-8, maybe that will be a task for me to accomplish when I get better ramped up on the project
codeLes
@Vanger you need to ensure the input encoding is set, so the tool knows what it is encoding from; the resultant files are not UTF-8 explicitly (but can be treated as valid Cp1252, UTF-8, ISO-8859-15, ISO-8859-1, etc.)
McDowell
@codeLes - since source files don't carry any encoding metadata, you're going to have to manage that information in every tool and script you use the source code with. I prefer to use the \uXXXX escape sequences and minimize their usage. Either way, you have to be disciplined or you'll end up with encoding bugs.
McDowell
@McDowell - this has been my concern, as for now I've got past this issue, but in my mind it isn't really resolved. honestly we should be using UTF-8 as far as I can tell from research, but alas I'm 1 week new to this project.
codeLes
honestly I'd just like to know WHAT IN THE WORLD makes Ant in eclipse use the different encodnig for javac when even eclipse compiles cp1252
codeLes
A: 

When using Ant on the command line, it automatically uses the system default encoding, which seems to be windows-1252 on your system.

When using Ant from Eclipse, it reads the local encoding property of the source files/folders to determine which one must be used. This property is in the Resource page of the Properties dialog, available when right-clicking on a source folder.

When nothing is specified, the workspace wide default encoding is used. It is configurable from withing the Window>Preferences dialog.

Hope this helps.

Michel Nolard