views:

74

answers:

2
function getChnRandChar($length) {
    mt_srand((double)microtime() * 1000000);
    $hanzi = '';
    for ($i = 0; $i < $length; $i++) {
        $number = mt_rand(16, 56) * 100 + mt_rand(1, 19);
        $tmpHanzi = chr(mb_substr($number, 0, 2) + 160);
        $tmpHanzi .= chr(mb_substr($number, 2, 2) + 160);
        $hanzi .= mb_convert_encoding($tmpHanzi, 'utf8', 'gb2312');
    }
    return $hanzi;
}
+2  A: 

Because the charater encoding is GB2312.

GB2312 is the registered internet name for a key official character set of the People's Republic of China, used for simplified Chinese characters.

Petah
No, the output encoding is UTF-8. It's `mb_convert_encoding($str, $to_encoding, $from_encoding)`.
mercator
Edited.........
Petah
+3  A: 

It first generates a random GB2312 character and then converts it to UTF-8.

The character it generates is in the 16th to 56th row and 1st to 19th column of the 94x94 grid, so it only includes a small subset of Chinese characters, and excludes all non-Chinese characters in the GB 2312 character set.

It first generates a random number in the ranges 1601-1619, 1701-1719, ... 5601-5619, which are all GB2312 codepoints. The second and third lines of the for loop then encode the code point as a two-byte EUC-CN sequence:

To map the code points to bytes, add 160 (0xA0) to the 1000's and 100's value of the code point to form the high byte, and add 160 (0xA0) to the 10's and 1's value of the code point to form the low byte.

The last line then converts the 2-byte EUC-CN encoded character to a UTF-8 character.

mercator
Why is it guranteed to be `gb2312` ?
yoyo
@yoyo, I've clarified my answer a bit.
mercator