views:

65

answers:

2

For pages already specified (either by HTTP header, or by meta tag), to have a Content-Type with a UTF-8 charset... is there a benefit of adding accept-charset="UTF-8" to HTML forms?

(I understand the accept-charset attribute is broken in IE for ISO-8859-1, but I haven't heard of a problem with IE and UTF-8. I'm just asking if there's a benefit to adding it with UTF-8, to help prevent invalid byte sequences from being entered.)

A: 

I did not encounter any problems using UTF-8 with IE (6+) or any other major browser out there. You need to make sure, that a UTF-8 meta tag is set (IE needs this) and that all your files are UTF-8 encoded (which means that the webserver sends UTF-8 headers). Then there should not be any problem if you omit accept-charset.

elusive
I'm doing those things, sans that form attribute, I'm getting some cases of invalid UTF-8 being input (http://stackoverflow.com/questions/3715264/how-to-handle-user-input-of-invalid-utf-8-characters), so I'm trying to find out conclusively if adding this to all my forms will be helpful or unnecessary.
philfreo
@philfreo: I never used it once and had no problems at all. Can you hand us a link to your page?
elusive
If your page is really being properly served as UTF-8, you shouldn't get non-UTF-8 submissions from that form. Of course, if you've got other sites embedding a form that points to your site, or automated agents submitting content in general, all bets are off.
bobince
Our server is serving all pages as UTF-8, and we aren't (intentionally) receiving data from other sources. We aren't getting a lot of invalid UTF-8, but we do get some every once in a while. As my other question indicates, looking for an overall approach to solving that. This question I was hoping to hear conclusively whether the `accept-charset` attribute was necessary (made any difference) given a UTF-8 http header.
philfreo
A: 

If the page is already interpreted by the browser as being UTF-8, setting accept-charset="utf-8" does nothing.

If you set the encoding of the page to UTF-8 in a <meta> and/or HTTP header, it will be interpreted as UTF-8, unless the user deliberately goes to the View->Encoding menu and selects a different encoding, overriding the one you specified.

In that case, accept-encoding would have the effect of setting the submission encoding back to UTF-8 in the face of the user messing about with the page encoding. However, this still won't work in IE, due the previous problems discussed with accept-encoding in that browser.

So it's IMO doubtful whether it's worth including accept-charset to fix the case where a non-IE user has deliberately sabotaged the page encoding (possibly messing up more on your page than just the form). Personally, I don't bother.

bobince
Are you sure? That makes sense but the doc says `may interpret` and that the default is UNKNOWN.
philfreo
On all browsers (now and historically), `UNKNOWN`/unset always means the current page encoding, whether that was the server's page encoding set in a header/meta, or the encoding explicitly set by the user as an override. Exception that probably doesn't affect you: most browsers will not send form submissions in a non-ASCII-superset encoding like UTF-16 even if the page was served as that. It doesn't really make sense to do so.
bobince