views:

600

answers:

5

Today, I have looked into HTML code of facebook.com, and found something like this:

<input type="hidden" value="€,´,€,´,水,Д,Є" name="charset_test"/>

It's repeated two times inside the <form>...</form>.

Any idea what this code might be useful for - some kind of server-side client charset detection? As far as I know, browser charset is being transmitted in HTTP request anyway (an "Accept-Charset" header).

A: 

it's probably the prohibied characters.. still i am not sure :D

PirosB3
+2  A: 

I guess they are matching this in the receiving script to make sure the client sent the request properly encoded as UTF-8 and maybe even, because they know what characters to expect, to detect the actual encoding on the fly.

If I remember correctly - I had to deal with it once - there have been problems with form encoding in IE6 in some situations.

Pekka
Thank you, I'm going to google about this IE6 related form problem.
Void
I may be wrong, but I *think* it was something about ambigous encodings (i.e. when the `content-type` header says something different from the `content-type` META tag). Anyway, I think Facebook are doing this because they are being accessed by all kinds of clients, and they need to make sure their encoding is generally right.
Pekka
A: 
&euro;,&acute;,€,´,水,Д,Є

I guess some browser send &euro; same as and &acute; same as ´,

So they can check like charset_test[0] == charset_test[2] and charset_test[1] == charset_test[3]

For others other characters, I have no clue. 水 probably test for CJK.

S.Mark
A: 

As Pekka says, this is to be able to detect the request charset. The HTTP protocol doesn't provide a way to specify the charset of a request. Because of this, one has to rely on conventions outside of the protocol. Generally browsers are predictable, but this trick is the only way to be 100% sure.

See also: http://www.phpwact.org/php/i18n/charsets

troelskn
+2  A: 

Any idea what this code might be useful for - some kind of server-side client charset detection?

Apparently so.

The Euro sign is useful for charset detection because there are so many ways of encoding it:

  • E2 82 AC in UTF-8
  • 88 in windows-1251
  • 80 in the other windows-125x encodings
  • A4 in ISO-8859-7, -15, and -16
  • A2 E3 in GB18030
  • 85 40 in Shift-JIS
  • etc.

As far as I know, browser charset is being transmitted in HTTP request anyway (an "Accept-Charset" header).

It's supposed to transmitted in the HTTP Content-Type header, but that doesn't mean that user agents actually get it right.

dan04