+4  A: 

I've tried to wrap the affected data in encodeURIComponent()

Nah, if you're passing in a {} object, jQuery will take care of UTF-8 and URL-encoding it for you.

When I use that AJAX with htmlentities() in my php, my umlauts look like this in plain text: UE �, AE �, OE �, ue ü, ae ä, oe o

If you must use htmlentities(), you have to tell it your encoding is UTF-8 in the optional $charset argument, else it will (stupidly) default to treating all your bytes as ISO-8859-1, and encode them to inappropriate entity references, one for each byte.

Better is to use htmlspecialchars() instead, as it does not attempt to apply unnecessary encoding to characters other than the few ASCII characters that really need it.

And like this in the database: UE Ü , AE Ä, OE Ö, ue ü, ae ä, oe o

How are you determining that? Does the tool you are using to grab data out of the database know about Unicode? (If it's a dodgy PHP web admin interface, maybe not. PHP isn't great at Unicode.)

It is possible that you're storing proper UTF-8 bytes in the database, but in tables marked as having a Latin-1 collation. This will work, in as much as you'll get the same bytes out as you put in, but if MySQL doesn't know they're UTF-8 bytes then case-insensitive string comparisons outside the ASCII range won't work right, so looking for Ä won't match ä. That may or may not matter to you.

If I don't use htmlentities() but mysql_real_escape_string() instead

Whoah, careful. HTML-escaping is for the output stage to the page. SQL-string-literal-escaping occurs when creating an SQL query. You need them both, but don't mix them up or attempt to do them at the same stage, or you'll have all sorts of weird escapes-gone-wrong and potential vulnerabilities.

bobince
When I use htmlspecialchars(), the characters look good on the site but like this in the database: ü (regardless if the db is UTF-8 or latin1). I use SQLyog to access the database, I don't have a webinterface like phpmyadmin. They also look messy when I use my custom built admin interface to edit them.
rayne
OK, SQLyog *claims* to support Unicode so hopefully it should be getting it right. If it's important to you to have the data look right in the admin interface, you need to use `CREATE TABLE ... CHARACTER SET utf8` to create your tables and call `mysql_set_charset('utf8')` from PHP before using the database connection.
bobince
A: 

It sounds like the problem is occurring when inserting the data into the database. Are you using MySQL? After connecting to your database server issue the query :

SET NAMES utf8;

This will tell the database server that the client connection wishes to send data in UTF-8 and to interpret it as such.

Also, when sending this data to the browser make sure to set the ContentType header

header('Content-type: text/html; charset=utf-8');

This will tell the browser to interpret the data as UTF-8.

Stephen Curran
A: 

Try use this function instead of htmlentities

htmlspecialchars()

Codler
A: 

I have finally found a solution that works for me; I removed the contentType: "application/x-www-form-urlencoded;charset=UTF-8" from my jQuery ajax, I only use htmlentities($value, ENT_NOQUOTES, 'UTF-8'); for processing the data with SQL and my database is set to utf8 unicode.

The characters are displayed correctly and are stored as ä for ä and so on in the database.

rayne
Please don't store HTML-encoded data in the database! HTML-escaping is an output concern that should happen always and only at the page-output stage. It doesn't belong in the data access layer. If you put HTML-encoded data in the database you won't be able to do searches like `LIKE '%uml%'` (it won't be able to tell the difference between an encoded umlaut and the text “uml”), every `SUBSTRING` operation (including implicit trimming due to field length limits) risks breaking an entity reference and producing broken HTML, and it'll mess up any non-HTML use of the table data like sending mail.
bobince
Oh really? I didn't know that, but I'm a bad programmer in general ;)When I remove the htmlentities() from my script, my special characters look like this in the database again: ü Oddly, when I send the data only through PHP (when deactivating javascript), they look fine in the database (ä). So the problem is most likely caused by the jQuery ajax.
rayne