ansaurus

Question

Answer 1

+1 A:

I don't know the charset, but if you are using HTML to show the results you should set the charset of the html

     <META http-equiv="Content-Type" content="text/html; charset=EUC-JP">

You can also use iconv (php function) to convert the charset to a different charset http://php.net/manual/en/book.iconv.php

And last but not least, check your database encoding for the tables.

But i guess that in your case you will only have to change the meta tag.

aviv 2010-07-29 09:30:26

actually meta tag can do nothing. it must be **HTTP** header, not http-equiv surrogate

Col. Shrapnel 2010-07-29 09:34:11

@Col: ? You very much *can* change the charset the browser uses from a `<meta http-equiv>`. That's the whole point. Sending an accurate `Content-Type` header *as well* is a good idea though.

bobince 2010-07-29 09:41:57

`<meta http-equiv>` is only used if the real HTTP header is *missing*.

David Dorward 2010-07-29 09:47:23

Answer 2

+1 A:

Basically all charset problems stem from the fact that they're being mixed and/or misinterpreted.

A string (text) is a sequence of bytes in a specific order. The string is encoded using some specific charset, that in itself is neither right nor wrong nor anything else. The problem is when you try to read the string, the sequence of bytes, assuming the wrong charset. Bytes encoded using, for example, KS X 1001 just don't make sense when you read them assuming they're UTF-8, that's where the question marks come from.

The site you're getting the text from sends it to you in some specific character set, let's assume KS X 1001. Let's assume your own site uses UTF-8. Embedding a stream of bytes representing KS X 1001 encoded text in the middle of UTF-8 encoded text and telling the browser to interpret the whole site as UTF-8 leads to the KS X 1001 encoded text not making sense to the UTF-8 parser.

UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
KSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKS
UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU

will be rendered as

Hey, this is UTF-8 encoded text, awesome!
???????I?have?no?idea?what?this?is???????
Hey, this is UTF-8 encoded text, awesome!

To solve this problem, convert the fetched text into UTF-8 (or whatever encoding you're using on your site). Look at the Content-Type header of that other site, it should tell you what encoding the site is in. If it doesn't, take a guess.

deceze 2010-07-29 09:45:43

Answer 3

+1 A:

You need to:

tell the browser what encoding you wish to receive in the form submission, by setting Content-Type by header or <meta> as in aviv's answer.
tell the database what encoding you're sending it bytes in, using mysql_set_charset().

Currently you are using EUC-KR in the database so presumably you want to use that encoding in both the above points. In this century I would suggest instead using UTF-8 throughout for all web apps/databases, as the East Asian multibyte encodings are an anachronistic unpleasantness. (With potential security implications, as if mysql_real_escape_string doesn't know the correct encoding, a multibyte sequence containing ' or \ can sneak through an SQL injection.)

However, if enpang.com are using EUC-KR for the encoding of the Name URL parameter you would need either to stick with EUC-KR, or to transcode the name value from UTF-8 to EUC-KR for that purpose using iconv(). (It's not clear to me what encoding enpang.com are using for URL parameters to their name check service; I always get the same results anyway.)

bobince 2010-07-29 09:53:51

Well, that's the problem. I don't know which encoding they are using either..

lesderid 2010-07-29 09:58:07

Is the web service documented anywhere?

bobince 2010-07-29 09:58:48

I don't think so. However, it's ofcourse used on their register page: http://join.enpang.com/member/joinStep1.aspI just checked and that page is using euc-kr.

lesderid 2010-07-29 10:02:45

Ah well, you can only try with a known-used/unused username, I guess. (I can't read Hangul other than merely phonetically, so I can't immediately see how to use the site.) Note that when you are creating the URL query string you should use `urlencode` on the parameters to turn them into `%nn` sequences.

bobince 2010-07-29 12:35:50

ansaurus

tags:

views:

answers:

How to make PHP use the right charset?

related questions