ansaurus

Question

How to force browser to set charset in content-type http header

Answer 1

+2 A:

Does some one have a clue how to force the current browsers to append the charset to the Content-Type header?

No, no browser has ever supplied a charset parameter with the application/x-www-form-urlencoded media type. What's more, the HTML spec which defines that type, does not propose a charset parameter, so the server can't reasonably expect to get one.

(HTML4 does expect a charset for the subparts of a multipart/form-data submission, but even in that case no browser actually complies.)

accept-charset="utf-8"

accept-charset is broken in IE, and shouldn't be used. It won't make a difference either way for forms in pages served as UTF-8, but in other cases it can end up with inconsistent results.

No, with forms you just have to serve the page they're in as UTF-8, and the results should come back as UTF-8 (with no identifying marks to tell you that (except potentially for the _charset_ hack, but Tomcat doesn't support that).

So you have to tell the Servlet container what encoding to use for parameters if you don't want it to fall back to its default (which is usually wrong). In a limited set of circumstances you may be able to call ServletRequest.setCharacterEncoding() to do this, but this tends to be brittle, and doesn't work at all for parameters taken from the query string. There's not a standardised Servlet-level fix for this, sadly. For Tomcat you usually have to muck about with the server.xml instead of being able to fix it in the app.

bobince 2010-03-10 17:37:46

Good answer, expect of the Tomcat part. The `ServletRequest#setCharacterEncoding()` actually sets the charset to be used to parse the request **body** (in other words: POST parameters) and the `URIEncoding` in `server.xml` actually sets the charset to be used to parse request **URI** (in other words: GET parameters). As he is using POST in his example, just using `ServletRequest#setCharacterEncoding()` is sufficient. More details in this article: http://balusc.blogspot.com/2009/05/unicode-how-to-get-characters-right.html

BalusC 2010-03-10 19:20:51

It's sufficient, it can just be fragile. If any request parameter is read, it will cause the request body to be read and decoded, after which any call to `setCharacterEncoding` will be ineffective. It's easy for some sneaky middleware component to mess things up by jumping in and reading a parameter...

bobince 2010-03-10 20:13:38

@bobince: you mean "http spec" not "html spec", don't you? Actually the http spec says "Data in character sets other than 'ISO-8859-1' or its subsets MUST be labeled with an appropriate charset value." in the "3.7 Media Types" section: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7

Eduard Wirch 2010-03-11 09:19:12

No, I mean HTML spec: the HTTP spec has nothing to say about how form data is encoded into a request body. The reference to ISO-8859-1 in RFC2616 only applies to those “some media types” which define a `charset` parameter; neither `application/x-www-form-urlencoded` nor `multipart/form-data` define one, so the rule doesn't affect form submissions. `form-urlencoded` doesn't even include any direct high bytes, only `%`-encoded versions of same, so it wouldn't be affected by that even if there was a `charset` parameter for it.

bobince 2010-03-11 21:47:52

Meanwhile the content of `multipart` subparts' headers are not affected by RFC2616 but by normal MIME header rules; it should be possible as per RFC2388 (which defines `multipart/form-data`) to specify an encoding for subparts, but no browsers do so and very few servers will even bother look for it. (And the odd one will break if you try, which is why no browser added support for it.)

bobince 2010-03-11 21:49:57

ansaurus

tags:

views:

answers:

How to force browser to set charset in content-type http header

related questions