views:

59

answers:

2

Hi.

My form

<form action="saveProfile.php" method="post" name="ProfileUpdate" id="ProfileUpdate" >
<input name="Smeknamn" id="Smeknamn" type="text" value="<?php echo $v["user_name"]; ?>" maxlength="16" id="ctl00_ctl00_cphContent_cphContent_cphContentLeft_tbUsername" onkeydown="return ((event.keyCode != 16) || (event.keyCode == 16 &amp;&amp; this.value.length >= 1));" style="width: 130px;" />
</form>

When I try to echo $_POST["Smeknamn"]; on saveProfile.php i get Ã�Ã�Ã� on the characters Ö Ä Å

Why is this happening? saveProfile AND editProfile is encoded in UTF-8 without BOM, and meta utf8 and all that.

UPDATE UPDATE

$smeknamn = $data["Smeknamn"]

Sorry forgot to mention that i had this foreach. And its $smeknamn im echoing and getting Ã�Ã�Ã�. I just tried $_POST["Smeknamn"] and it echo out ÖÄÅ just fine.. So the problem is now in the foreach() that makes the öäå chars Ã�Ã�Ã�. How can i fix this?

foreach($_POST as $key => $value) {
    $data[$key] = filter($value);
}
function filter($data) {
    $data = trim(htmlentities(strip_tags($data)));

    if (get_magic_quotes_gpc())
        $data = stripslashes($data);

    $data = mysql_real_escape_string($data);

    return $data;
}
+2  A: 

Try encoding editProfile.php and saveProfile.php as UTF-8 with BOM.

Darin Dimitrov
Did it now, still the same..
Karem
+1: Does indeed look like encoding confusion. In particular, it looks like saveProfile.php is echoing UTF-8 encoded data yet declaring it is something else, such as ISO 8859-1. Note that the **default** encoding is ISO 8859-1. Lots of people still seem to get this wrong, and the bodges by browser makers around this don't help either. Ugh.
Donal Fellows
Sorry please check my updated question, my fault i didnt mention the foreach loop, it was not the encoding..
Karem
+1  A: 

This is a character encoding issue.

I guess your data is actually encoded with UTF-8 so the character Ö (U+00D6) is encoded with 0xC396. Now when htmlentities is called without specifying the charset parameter, it implicitly uses ISO 8859-1:

[…] optional third argument charset which defines character set used in conversion. Presently, the ISO-8859-1 character set is used as the default.

And when interpreting the byte sequence 0xC396 with ISO 8859-1 it represents the two ISO 8859-1 characters 0xC3 and 0x96. Since there is the entity Atilde for the ISO 8859-1 character 0xC3, this character is replaced by htmlentities with the reference &Atilde;. But there isn’t any entity representing the second character 0x96, so it’s not being replaced. That means:

htmlentities("\xC3\x96") === "&Atilde;\x96"

Now when this is interpreted by the user agent, the character reference gets displayed correctly but the remaining byte 0x96 is not a valid byte sequence for a character in UTF-8. That’s why the replacement character is displayed instead.

So the problem is that you didn’t specify the correct character encoding for htmlentities:

htmlentities("\xC3\x96", ENT_COMPAT, "UTF-8") === "&Ouml;"

But as you’re already using UTF-8 for your output, you don’t need to replace such characters and using htmlspecialchars instead will suffice to replace the HTML special characters.

But besides that, you shouldn’t use such an universal-like filter function as every language and context has its own special character that need to be taken care of.

Gumbo