I've tried to wrap the affected data in encodeURIComponent()
Nah, if you're passing in a {}
object, jQuery will take care of UTF-8 and URL-encoding it for you.
When I use that AJAX with htmlentities() in my php, my umlauts look like this in plain text: UE �, AE �, OE �, ue ü, ae ä, oe o
If you must use htmlentities()
, you have to tell it your encoding is UTF-8
in the optional $charset
argument, else it will (stupidly) default to treating all your bytes as ISO-8859-1, and encode them to inappropriate entity references, one for each byte.
Better is to use htmlspecialchars()
instead, as it does not attempt to apply unnecessary encoding to characters other than the few ASCII characters that really need it.
And like this in the database: UE Ü , AE Ä, OE Ö, ue ü, ae ä, oe o
How are you determining that? Does the tool you are using to grab data out of the database know about Unicode? (If it's a dodgy PHP web admin interface, maybe not. PHP isn't great at Unicode.)
It is possible that you're storing proper UTF-8 bytes in the database, but in tables marked as having a Latin-1 collation. This will work, in as much as you'll get the same bytes out as you put in, but if MySQL doesn't know they're UTF-8 bytes then case-insensitive string comparisons outside the ASCII range won't work right, so looking for Ä
won't match ä
. That may or may not matter to you.
If I don't use htmlentities() but mysql_real_escape_string() instead
Whoah, careful. HTML-escaping is for the output stage to the page. SQL-string-literal-escaping occurs when creating an SQL query. You need them both, but don't mix them up or attempt to do them at the same stage, or you'll have all sorts of weird escapes-gone-wrong and potential vulnerabilities.