views:

2865

answers:

3

I'm trying to encode Cyrillic UTF-8 array to JSON string using php's function json_encode. The sample code looks like this:

<?php
  $arr = array(
                'едно' => 'първи',
                'две' => 'втори'
              );
  $str = json_encode($arr);
  echo $str;
?>

It works fine but the result of the script is represented as:

{"\u0435\u0434\u043d\u043e":"\u043f\u044a\u0440\u0432\u0438","\u0434\u0432\u0435":"\u0432\u0442\u043e\u0440\u0438"}

which makes 6 characters for each Cyrillic character. Is there a way to get the original characters for key/value pairs instead of encoded ones?

+2  A: 

It looks like PHP's built in json_encode only works with UTF-8, and no other bells and whistles for tweaking how it works with respect to encoding.

I found A completely fair and balanced comparison of php json libraries on Google. It might help you. You might try another library based on the tables here, if possible. There are additional PHP libraries listed at json.org that you can experiment with.

Beau Simensen
+2  A: 

It worked with http://pear.php.net/pepr/pepr-proposal-show.php?id=198

With nasty bypass in JSON.php, rows 298..

$char = pack('C*', $ord_var_c, ord($var{$c + 1}));
$c += 1;
//$utf16 = $this->utf82utf16($char);
//$ascii .= sprintf('\u%04s', bin2hex($utf16));
$ascii .= $char;

Thanks!

AquilaX
You are welcome. :) How about a +1? :P
Beau Simensen
Thanks so much for that! Seems to be working fine. Have you experienced any problems with it so far?
Emanuil
None. Working like a charm so far.
AquilaX
+1  A: 

I found this in the code of Zend framework:

http://framework.zend.com/svn/framework/standard/trunk/library/Zend/Json/Decoder.php

Take a look at the function decodeUnicodeString ( line 474 ):

 /**
     * Decode Unicode Characters from \u0000 ASCII syntax.
     *
     * This algorithm was originally developed for the
     * Solar Framework by Paul M. Jones
     *
     * @link   http://solarphp.com/
     * @link   http://svn.solarphp.com/core/trunk/Solar/Json.php
     * @param  string $value
     * @return string
     */
    public static function decodeUnicodeString($chrs)

It's static, and you can easily extract it - just replace the line:

490:           $utf8 .= self::_utf162utf8($utf16);

with:

490:           $utf8 .= mb_convert_encoding($utf16, 'UTF-8', 'UTF-16');

Not an ideal solution, but did the job for me :o)

Boris Chervenkov