views:

29

answers:

2

Hi !

I have a problem with Tapestry form. My XML database is very sensible with encoding, and need utf-8.

When i put the char 'à' in my form, tapestry receive 'Ó' and my core get an error : Invalid byte 2 of 3-byte UTF-8 sequence.

I haven't the problem in eclipse with local default configuration for tomcat.

But whatever the tomcat configuration, i think my application must do the conversion itself.

So i try :

The charset of every page is always utf-8.

So, what could i do before using java Charset encoder ?

thank you for helping me. :)

A: 

I wouldn't think there's anything wrong with your application. Tapestry does everything in UTF-8 by default; that wiki page is fairly out of date (referring to the 5.0.5 beta, where apparently forms with file uploads still didn't use UTF-8 properly).

You're saying you don't have the problem locally. Have you tried running on a different server? If you do not have the problem there, there's probably something wrong with the codepage settings of the operating system on the server.

Purely anecdotal evidence below

I have once had a similar character set problem in a Tapestry 5 app on the production server (running SUSE Linux) that I could not reproduce on any other server. All seemed fine with the application, the Tomcat server, and the codepage settings of the system, but POST data would end up decoded as ISO 8859-1 instead of UTF-8 in the application. The app had run on that server for a year before the problem manifested - maybe through an update in the operating system.

Afer a day of not getting anywhere, we ended up just re-installing the whole server OS, and everything was fine again.

Henning
Here is a litte test :String stringUTF8 = new String ( client.getName().getBytes(), "UTF-8" );logger.info("charset utf8 : "+stringUTF8);String stringISO = new String ( client.getName().getBytes(), "ISO-8859-1" ); logger.info("charset ISO-8859-1 : "+stringISO);and the result for 'à' :charset utf8 : ?charset ISO-8859-1 : ÓThe windows command shell where i launch the server has charset problem too : é => ÚCharset could be set to CP850 ?
alex
Wow, no, that's totally confused. Doing `String.getBytes()` transforms the string (which is in Unicode in Java) into a byte array *in the platform's default encoding*. Could be CP850, could be anything. Doing new `String(bytes, charset)` creates a new String from that byte array, using `charset` as the character set when reading the byte array into a Unicode representation. Unless your default charset is the same as the `charset` parameter, this cannot ever work.When you print `client.getName()` to the console or inspect it in the debugger, do you see the proper characters?
Henning
In eclipse console : 'à' => 'à' + OK in databaseIn tomcat + windows shell : 'à' => 'Ó' + errorCould windows shell change the defautl charset of tomcat ?
alex
Tomcat doesn't really have a default charset, the default depends on the underlying operating system. I still think your problem has nothing to do with that.
Henning
A: 

The problem was about the default charset of the JVM launched into windows shell. It caused trouble with FileWriter and then show bad character in the console :)

alex
Ah, you were writing stuff to files. That could have been useful information.Anyway, yes, you should never use Readers and Writers without specifying the character set explicitly. The platform default is hardly ever what you want.
Henning
Yes i was writing stuff to file. But this part of code isn't mine. Always verify the code a mate give to you :( thanks helping me anyway ;)
alex