views:

167

answers:

2

I want to convert this [email protected] to

hello@domain.com

I have tried:

url_encode($string)

this provides the same string I entered, returned with the @ symbol converted to %40

also tried:

htmlentities($string)

this provides the same string right back.

I am using a UTF8 charset. not sure if this makes a difference....

+8  A: 

Here it goes (assumes UTF-8, but it's trivial to change):

function encode($str) {
    $str = mb_convert_encoding($str , 'UTF-32', 'UTF-8'); //big endian
    $split = str_split($str, 4);

    $res = "";
    foreach ($split as $c) {
        $cur = 0;
        for ($i = 0; $i < 4; $i++) {
            $cur |= ord($c[$i]) << (8*(3 - $i));
        }
        $res .= "&#" . $cur . ";";
    }
    return $res;
}

EDIT Recommended alternative using unpack:

function encode2($str) {
    $str = mb_convert_encoding($str , 'UTF-32', 'UTF-8');
    $t = unpack("N*", $str);
    $t = array_map(function($n) { return "&#$n;"; }, $t);
    return implode("", $t);
}
Artefacto
Nice. ---------
Pekka
It's not necessary to print $cur as unsigned when converting to a string in `$res .= "" . $cur . ";";` because the range of unicode characters doesn't go that far. However, if you have an invalid UTF-8 sequence, this could give negative values (I don't know if mb_convert_encoding validates the range).
Artefacto
This is a brilliant answer for 3 reasons: 1. I couldn't have thought of it myself. 2. Is elegant, and works well, 3.I have learnt a lot of good stuff from it. Thanks.
Ashley Ward
+2  A: 

Much easier way to do this:

function convertToNumericEntities($string) {
    $convmap = array(0x80, 0x10ffff, 0, 0xffffff);
    return mb_encode_numericentity($string, $convmap, "UTF-8");
}

You can change the encoding if you are using anything different.

  • Fixed map range. Thanks to Artefacto.
SileNT
Nice, I haven't tested, but I suppose you also have to change the map to cover all the unicode characters.
Artefacto
probably something like `$convmap = array(0x000000, 0x10ffff, 0, 0xffffff);` (untested)
Artefacto