views:

364

answers:

4

Hi there,

I'm facing a strange problem in one of my JSF (which is a facelet). I'm using Richfaces and on one page I got a normal form

<h:form></h:form>

My problem is when I submit the form all UTF-8 chars - like german umlauts (äöü) - are recieved encrypted. If I change the page to ISO-8859-1 on my browser it works.

If I expand the form with attribute

<h:form id="register-form" acceptcharset="ISO-8859-1">

it works too. (just for german umlauts) - other UTF-8 chars will be encrypted to something unreadable.

Does anyone could give me hand by this?

A: 

You need to set the POST request encoding by HttpServletRequest#setCharacterEncoding(). Best place for this is a Filter which is mapped on the desired url-pattern. To get world domination you of course want to use UTF-8 all the time. The doFilter() method would basically look like:

if (request.getCharacterEncoding() == null) {
    request.setCharacterEncoding("UTF-8");
}
chain.doFilter(request, response);

This is however not the only which you need to take into account with regard to character encoding. For more background information and another (detailed) solutions for a Java EE webapplication, you may find this article useful as well: Unicode - How to get the characters right?

Update: as per the comments:

I've already implemented a filter - after a little bit google. But it has no impact on my problem.

Then the problem is more in the tool which you use to store/display the characters. How did you found out that the characters were garbled? In the logging statements? If so, does it use UTF-8? Or is it in the log file viewer/console? If so, does it use UTF-8? Or is it in the database table? If so, does it use UTF-8? Or is it in the database admin tool? If so, does it use UTF-8? Or is it in the result page? If so, does it use UTF-8? Etcetera.. Go through the solutions section of the aforementioned link how to get them all right.

BalusC
I think filter is not required in this case. I'm making a cyrillic (utf-8) app and have utf-8 placed in only one place - ontop of each jsp. That makes it work like charm.
Bozho
-1: setting the character encoding of the request is almost never the right thing to do, as it only masks errors made somewhere else.
Michael Borgwardt
Only for response processing, yes, but not for request processing.
BalusC
Um... no? The other way round. The webapp is free to choose what it uses as response encoding and it will usually work because it's using UTF-16 internally and automatically converts the output with that encoding. But the request should already have the correct encoding declared, or you did something wrong elsewhere (i.e. when writing the page that generated the request), in which case you should fix that rather than glossing over the error by setting the charset manually.
Michael Borgwardt
That was a comment to Bozho, not to you. Yours makes als no sense since I don't override the encoding, but just set it when it is null. You can be certain that it is UTF-8 when you set the content-type of the initial request to the same encoding as well. Your -1 was just another silly nitpicking again as is mine on your message by the way (sigh).
BalusC
I still think that it's your statements that make no sense. The *request* encoding should always be set by the browser. It should never be null. The browser will take cues about which encoding to use based on the page on which the form is displayed - which the server can influence by setting the *response* encoding while serving that page.
Michael Borgwardt
About request encoding: you overestimate webbrowsers. About response encoding: it's more the `charset` attribute inside the content type header which instructs the browser which encoding to use for display. The response encoding more instructs the Java's `Writer` which encoding to use when writing chars to the OutputStream (and it "automagically" also sets the charset in the content type header).
BalusC
I've already implemented a filter - after a little bit google. But it has no impact on my problem.
asrijaal
Yes, the response encoding is what the writer is using. If what the writer is using conflicts with the HTTP header (or its meta http-equiv) then the text on the page gets scrambled.On the other hand, I think the request encoding should be as Michael says - determined by the encoding of the page.
Bozho
@BalusC - your update: After I submit the form a confirmation page shows up this is the place where my chars are already broken.
asrijaal
The basic point is that the encoding that is declared must always match the one that was actually used to encode the request or response. The server ensures this with the response encoding (I donn't consider this "automagic", just proper logical behaviour), and the browser has to do the same with the request. Any browser that does not send an encoding in the request headers, or (worse) sends one that is different from the one actually used to encode the request body, is unusable garbage, and I am pretty sure all current browsers do this correctly.
Michael Borgwardt
@asrijaal: ensure that you set the **response** encoding and content type of the confirmation page *as well* (thus not only this, but *also* the **request** encoding). See the answer of Bozho for the Facelets way of setting the response encoding.
BalusC
A: 

This is the correct behavior. UTF-8 means that you want Unicode characters (i.e. non-ASCII or anything >= charpoint 128) must be encoded with two bytes.

But your JSF framework should decode the data into Unicode strings before your code can see it. So my guess is that you don't specify the encoding of the page or the form and therefore, your framework can only guess what it gets. Always set acceptcharset to utf-8 and the encoding of the whole HTML page to the same (using the meta tag).

Then it should work.

Links: Tips for JSF character encoding

Aaron Digulla
A: 

Put

<?xml version="1.0" encoding="UTF-8" ?>

ontop of your pages, and it should work fine.

Also:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

in your template (or again, in every page, if not using a template)

I'm currently making a utf-8 project, and hanven't set UTF-8 except ontop of each jsp/xhtml.

I can't recall exactly what happens behind the scene, but I think this line (<?xml) is instructing facelets what encoding should be used. This line is not sent to the browser.

P.S. The above is tested under MyFaces only (shouldn't matter, but still..)

Bozho
You really don't want that webbrowsers goes into quirks mode.
BalusC
this isn't sent to the browser at all - I think it's instructing facelets how to handle the encoding.
Bozho
Can be. Still then, his problem is also not response processsing, but request processing.
BalusC
yes, I'm also having no problems with request processing. Let me check my config files again for something I'm missing.
Bozho
nope, nothing missing. this should work (it works perfectly here). The bad thing is I can't tell _why_ exactly it works - if the question was asked a half an year ago when I was setting it up, I would've been able to :)
Bozho
A: 

How about

<h:form id="register-form" acceptcharset="UTF-8">

Not really meant as a fix, but if that makes all characters work, then it suggests that your real problem is that the page which contains the form is declared as US-ASCII. Browsers usually will send form submits in the encoding of the page unless acceptcharset says otherwise.

But it's hard to diagnose encoding problems in webapps because there are so many potential failure points where encodings are involvend. Especially hard when your understanding of encodings is as spotty as indicated by your wrong terminology ("UTF-8 characters"). I suggest you first read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Once you've read that, take a look at the HTML source of the form page and the HTTP headers of that page and the form request to see what encodings are being used. You should then be able to figure out where things are going wrong.

Michael Borgwardt
-1: `acceptcharset` is troublesome in MSIE when it does not match the response encoding. It is also superfluous if content type encoding is set.
BalusC
I don't think acceptCharset is needed either. Things are a lot simpler (perhaps thanks to facelets), and there are very few things needed.
Bozho
It's not superfluous when you want to allow characters in the form that the page's content type encoding can't handle. Yes, it's better then to change that content type encoding, if only to work around the MSIE bug. But I meant this more as a way to narrow down the error rather than as a fix.
Michael Borgwardt