views:

163

answers:

2

Previous issue - was not able to store non-english characters:

http://stackoverflow.com/questions/3008918/how-to-store-non-english-characters

That was fixed by using UTF8. But realized today that symbols like ♥☆ are not stored correctly. They get converted to characters like ♥☆.

How can this be fixed?

+12  A: 

It looks to me like they're being stored correctly, but that you're not interpreting them correctly when you read them out. and are going to end up as multibyte characters in UTF-8 encoding. I'll bet if you look up that multibyte encoding, you'll see it's the same as the single-byte encoding for ♥ and ☆ respectively.

Edit: adding details.

As you can see in the following table, interpreting the UTF-8 characters as if they were encoded as Windows Latin-1 gives the results you're seeing.

UTF-8 character      Hex
♥                    e2 99 a5
☆                    e2 98 86

Windows Latin-1      Hex
â                    e2
™                    99
¥                    a5
˜                    98
†                    86
Carl Norum
Ok. So how to display it correctly with PHP?
Yeti
It's not to do with PHP. In your HTML, you have to give a charset of UTF-8.
Coronatus
An old, very broken browser is also possible. It'd help if we had an actual HTML page generated by this application to look at.
Nicholas Knight
@Nicholas: Even IE6 can do UTF-8. If you're using a browser which can't do UTF-8, it barely qualifies as a browser these days.
Michael Madsen
@Michael: I'm glad you've not had the recent misfortune of dealing with browsers in real-world settings even older and more broken than IE6. Not all of us have been so lucky.
Nicholas Knight
@Nicholas: I'm guessing it must be corporate policies that caused this misfortune.
Yeti
@Nicholas... but those people expect the web to be broken. They're used to it. Just like people with 800x600 browsers expect to need to scroll. Of course, if you're getting paid by those people it's a different matter. (☆_☆)
Atømix
+1 for the nice encoding analysis
Lauri Lehtinen
+2  A: 

Is UTF8 used consistently across the whole spectrum (MySQL, PHP, Apache, <meta>s, headers..)?

For me this worked out of the box:

$query = "update tbl set col = '♥☆' where id = 1";
mysql_query($query) or die(mysql_error());
$query = "select col from tbl where id = 1";
$res = mysql_query($query) or die(mysql_error());
print_r(mysql_fetch_row($res));

Debug output:

Content-type: text/html; charset=utf-8
Array
(
    [0] => ♥☆
)
Lauri Lehtinen
It worked after adding: <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
Yeti