tags:

views:

979

answers:

3

For example, chinese text(GB2312) is pasted into a text box(or text area) of a html page and the form is posted. At the server side, is there any means by which this character set gets detected?

How would this detection behave if texts belonging to different character sets are pasted in a text box?

A: 

The web browser should send up a content type including encoding when it posts the data.

I find it helpful to think of text as "just text" (without any particular encoding) until an encoding is required. So the browser shouldn't care what encoding (if any) was used to originally produce the text (e.g. if it was copied and pasted from a file, the file's encoding is irrelevant). It decides what encoding to use when posting it to the server, obviously making sure that it's an encoding which covers all the characters it needs to send.

Jon Skeet
(If the browser sends the content-type) What parameter should I inspect to get hold of the encoding?Detection is important at the server side to convert all the text(rather characters) of different types into a specific encoding(say UTF-8).
Krishna
Use the Content-Type header - that should specify the character encoding used (for text data). But you're not converting the text *into* a specific encoding - you're converting it from the encoded form into characters.
Jon Skeet
I did not find a encoding type present in the headers. Say the browser posts raw data to the server, is there any fool-proof way of detecting an encoding? Or is it more of a intelligent guess work?
Krishna
A: 

if you use php on the server, you can use mb_detect_encoding

pixeline
+1  A: 

You need to tell the browser what encoding to use by adding an accept-charset="UTF-8" (or similar) attribute to the form. Apparently this defaults to the character set of the page, but I wouldn't count on that. The browser won't tell you what encoding it used when it submits the form, so you need to assume it used the one you told it to.

Andrew Duffy