ansaurus

Question

Answer 1

+2 A:

Try encoding editProfile.php and saveProfile.php as UTF-8 with BOM.

Darin Dimitrov 2010-09-05 10:00:55

Did it now, still the same..

Karem 2010-09-05 10:07:29

+1: Does indeed look like encoding confusion. In particular, it looks like saveProfile.php is echoing UTF-8 encoded data yet declaring it is something else, such as ISO 8859-1. Note that the **default** encoding is ISO 8859-1. Lots of people still seem to get this wrong, and the bodges by browser makers around this don't help either. Ugh.

Donal Fellows 2010-09-05 10:11:28

Sorry please check my updated question, my fault i didnt mention the foreach loop, it was not the encoding..

Karem 2010-09-05 10:14:28

Answer 2

+1 A:

This is a character encoding issue.

I guess your data is actually encoded with UTF-8 so the character Ö (U+00D6) is encoded with 0xC396. Now when htmlentities is called without specifying the charset parameter, it implicitly uses ISO 8859-1:

[…] optional third argument charset which defines character set used in conversion. Presently, the ISO-8859-1 character set is used as the default.

And when interpreting the byte sequence 0xC396 with ISO 8859-1 it represents the two ISO 8859-1 characters 0xC3 and 0x96. Since there is the entity Atilde for the ISO 8859-1 character 0xC3, this character is replaced by htmlentities with the reference Ã. But there isn’t any entity representing the second character 0x96, so it’s not being replaced. That means:

htmlentities("\xC3\x96") === "&Atilde;\x96"

Now when this is interpreted by the user agent, the character reference gets displayed correctly but the remaining byte 0x96 is not a valid byte sequence for a character in UTF-8. That’s why the replacement character � is displayed instead.

So the problem is that you didn’t specify the correct character encoding for htmlentities:

htmlentities("\xC3\x96", ENT_COMPAT, "UTF-8") === "&Ouml;"

But as you’re already using UTF-8 for your output, you don’t need to replace such characters and using htmlspecialchars instead will suffice to replace the HTML special characters.

But besides that, you shouldn’t use such an universal-like filter function as every language and context has its own special character that need to be taken care of.

Gumbo 2010-09-05 10:46:42

ansaurus

tags:

views:

answers:

Problems with characters like ÖÄÅ

related questions