tags:

views:

51

answers:

2

Lets assume the following string is entered into a form and submitted to a PHP script.

"€ should encode as €"

I would like to know how to actually get € to encode as € htmlentities() doesn't do it, what voodoo is needed in order to get that to encode properly (and others like it)?

+4  A: 

It works with htmlentities. But you need to make sure to use the proper character set as htmlentities’ default character set ISO 8859-1 does not contain that character; but ISO 8859-15 for example does:

var_dump(htmlentities("\xA4", ENT_COMPAT, 'ISO-8859-15') === '€');  // bool(true)

Here the "\xA4" will result in the byte 0xA4 that is the code of in ISO 8859-15.

So just make sure to use a character set that contains that character.

Gumbo
I've seen this advice in various places around the web, but it isn't working. I've tried htmlentities($descr_text, ENT_COMPAT, 'ISO-8859-7',false), and htmlentities($descr_text, ENT_COMPAT, 'ISO-8859-15',false); and they are not working. This is very aggravating, do you have an explanation for why this would not be working?
Fred
@Fred: The *charset* value needs to reflect the actual character set. So what is `$descr_text` encoded with?
Gumbo
I have no clue, and I don't know how to find out. The following: mb_detect_encoding($descr_text,'auto'); returns false. How do I go about finding out?
Fred
@Fred: Well, where does the value of `$descr_text` come from? If you typed that value with an editor, check its character encoding in its settings.
Gumbo
It's coming from an input form from firefox. Also, if you know of any articles for handling unicode in PHP I'd appreciate a link. Obviously I'm extremely ignorant when it comes to handling unicode in PHP.
Fred
@Fred: In that case specify the accepted character set with the [`accept-charset` attribute](http://www.w3.org/TR/html4/interact/forms.html#adef-accept-charset).
Gumbo
bam, that's what I was missing. Thank you Gumbo.
Fred
+2  A: 
echo "€ should encode as " . htmlentities("€", ENT_COMPAT, 'UTF-8');
Galen
Note that this only works if the file is encoded with UTF-8.
Gumbo