ansaurus

Question

Java application failing on special characters.

Answer 1

+3 A:

Try to use

java -Dfile.encoding=UTF-8 ...

when starting the application in both systems.

Another way to solve the problem is to change the encoding from both system to UTF-8, but i prefer the first option (less intrusive on the system).

EDIT:

Check this answer on stackoverflow, It might help either:

http://stackoverflow.com/questions/81323/changing-the-default-encoding-for-stringbyte

sakana 2008-10-30 17:03:30

yeah I have seen that before. My only problem is that I cant find where the java command is actually run. This is because the program is using Ant to run the code. Thanks for your answer though I will try to put it to use.

Scottm 2008-10-30 17:20:00

Answer 2

A:

You can also set the encoding at the command line, like so java -Dfile.encoding=utf-8.

sblundy 2008-10-30 17:03:45

Answer 3

+4 A:

That looks like a file that was converted by native2ascii using the wrong parameters. To demonstrate, create a file with the contents

Gérer les modÚ

and save it as "a.txt" with the encoding UTF-8. Then run this command:

native2ascii -encoding windows-1252 a.txt b.txt

Open the new file and you should see this:

G\u00c3\u00a9rer les mod\u00c3\u0161

Now reverse the process, but specify ISO-8859-1 this time:

native2ascii -reverse -encoding ISO-8859-1 b.txt c.txt

Read the new file as UTF-8 and you should see this:

Gérer les modÀ\u0161

It recovers the "é" okay, but chokes on the "Ú", like your app did.

I don't know what all is going wrong in your app, but I'm pretty sure incorrect use of native2ascii is part of it. And that was probably the result of letting the app use the system default encoding. You should always specify the encoding when you save text, whether it's to a file or a database or what--never let it default. And if you don't have a good reason to choose something else, use UTF-8.

Alan Moore 2008-10-30 18:45:45

good answer - I will look into your suggestion. Thanks

Scottm 2008-10-31 10:29:04

Answer 4

+1 A:

Instead of setting the system-wide character encoding, it might be easier and more robust, to specify the character encoding when reading and writing specific text data. How is your application reading the files? All the Java I/O package readers and writers support passing in a character encoding name to be used when reading/writing text to/from bytes. If you don't specify one, it will then use the platform default encoding, as you are likely experiencing.

Some databases are surprisingly limited in the text encodings they can accept. If your Java application reads the files as text, in the proper encoding, then it can output it to the database however it needs it. If your database doesn't support any encoding whose character repetoire includes the non-ASCII characters you have, then you may need to encode your non-English text first, for example into UTF-8 bytes, then Base64 encode those bytes as ASCII text.

PS: Never use String.getBytes() with no character encoding argument for exactly the reasons you are seeing.

Dov Wasserman 2008-10-30 23:34:04

Answer 5

A:

Hi Scott, I think we'll need more information to be able to help you with your problem:

What exception are you getting exactly, and which method are you calling when it occurs.
What is the encoding of the input file? UTF8? UTF16/Unicode? ISO8859-1?

It'll also be helpful if you could provide us with relevant code snippets.

Also, a few things I want to point out:

The problem isn't occurring at the 'é' but later on.
It sounds like the character encoding may be hard coded in your application somewhere.

Jack Leow 2008-10-31 00:52:23

The exception is one that is defined in our software, it is thrown when the parser has tried everything but still does not recognise the character.The encoding it is using is the system default, this was set to en_GB.ISO8859-15 by default. Im looking for a way to force the application to read UTF8

Scottm 2008-10-31 10:40:21

Answer 6

A:

Also, you may want to verify that operating system packages to support UTF-8 (SUNWeulux, SUNWeuluf etc) are installed.

Jack Leow 2008-10-31 01:22:35

Answer 7

+1 A:

hi all. I managed to get past this error by running the command

export LC_ALL='en_GB.UTF-8'

This command set the locale for the shell that I was in. This set all of the LC_ environment variables to the Unicode file encoding.

Many thanks for all of your suggestions.

Scottm 2008-10-31 11:58:23

Answer 8

A:

Java uses operating system's default encoding while reading and writing files. Now, one should never rely on that. It's always a good practice to specify the encoding explicitly.

In Java you can use following for reading and writing:

Reading:

BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(inputPath),"UTF-8"));

Writing:

PrintWriter pw = new PrintWriter(new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputPath), "UTF-8")));

mohitsoni 2010-05-24 06:16:13

ansaurus

tags:

views:

answers:

Java application failing on special characters.

related questions