views:

60

answers:

3

This doesn't work (just echoes "U4e9c"):

echo mb_convert_encoding("U4e9c","UTF-8","auto");

I guess some sort of casting "U4e9c" is needed, but can't figure out how...

+2  A: 

Hello, this comment provides two functions because unicode_decode() does not seem to exist in php5. Here are my tests, it seems to work:

greg@liche :) ~ > php -a
Interactive shell

php > function unicode_decode($str){                                           
php {     return preg_replace(
php (         '#\\\u([0-9a-f]{4})#e',
php (         "unicode_value('\\1')",
php (         $str);
php { }
php > 
php > function unicode_value($code) {
php {     $value=hexdec($code);
php {     if($value<0x0080)
php {         return chr($value);
php {     elseif($value<0x0800)
php {         return chr((($value&0x07c0)>>6)|0xc0)
php {             .chr(($value&0x3f)|0x80);
php {     else
php {         return chr((($value&0xf000)>>12)|0xe0)
php {         .chr((($value&0x0fc0)>>6)|0x80)
php {         .chr(($value&0x3f)|0x80);
php { } 
php > echo unicode_decode('\u4e9c');
亜
php > echo mb_convert_encoding(unicode_decode('\u4e9c'),  "UTF-8", "auto");
亜
greg0ire
+1 this sounds more like it. I don't think `mb_convert_encoding()` is designed to deal with `\uxxxxx` symbols
Pekka
Thanks for providing a working solution. But could this be condensed a bit (my unicode strings are always 4 hex digits), seems a lot of code for something seemingly straightforward?
ajo
These are functions, so you can put them in a separate file, or in a class of yours, and delete this file when you upgrade to php6. You're not obliged to copy the full code each time you want to display an unicode string. Use include() or autoload() to make sure you have access to this function or method you're going to define.
greg0ire
This seems to work fine:`echo "" . hexdec(str_replace("U","","U4e9c")) . ";";`
ajo
+1  A: 

This seems to work fine:

echo "&#" . hexdec(str_replace("U","","U4e9c")) . ";";

Update

Here is where the mb_convert_encoding comes in:

$k = "&#" . hexdec(preg_replace("/[Uu]/","","U4e9c")) . ";";
$k=mb_convert_encoding($k ,"UTF-8","HTML-ENTITIES");

This allows me to UPDATE my mysql DATABASE with $k, (whereas without mb_convert_encoding it only works for DISPLAYING inside an HTML page).

ajo
Indeed! Good job!
greg0ire
And I copied and pasted it :)
ajo
A: 
function utf8chr($i) {
    return iconv('UCS-4LE', 'UTF-8', pack('V', $i));
}

echo utf8chr(hexdec(substr('U4e9c', 1)));  // echo utf8chr(0x4E9C)
bobince