Using PHP against a UTF-8 compliant database. Here's how input goes in.
- user types input into textarea
- textarea encoded with javascript escape()
- passed via HTTP post
- decoded with PHP rawurldecode()
- passed through HTMLPurifier with default settings
- escaped for MySQL and stored in database
And it comes out in the usual way and I run unescape() on page load. This is to allow people to, say, copy and paste directly from a word document and have the smart quotes show up.
But HTMLPurifier seems to be clobbering non-UTF-8 special characters, ones that escape() to a simple % expression, like Ö, which escapes to %D6, whereas smartquotes escape to %u2024 or something and go into the database that way. It takes out both the special character and the one immediately following.
I need to change something in this process. Perhaps I need to change multiple things.
What can I do to not get special characters clobbered?