There are many ways to represent the +1 million UTF-8 characters. Take the latin capital "A" with macron (Ā
). This is unicode code point U+0100
, hex number 0xc4 0x80
, decimal number 196 128
, and binary 11000100 10000000
.
I would like to create a collection of the first 65,535 UTF-8 characters for use in testing applications. These are all unicode characters up to code point U+FFFF
(byte3).
Is it possible to do something like a for($x=0)
loop and then convert the resulting decimal to another base (like hex) which would allow the creation of the matching unicode character?
I can create the value Ā
using something like this:
$char = "\xc4\x80";
// or
$char = chr(196).chr(128);
However, I am not sure how to turn this into an automated process.
// fail!
$char = "\x". dechex($a). "\x". dexhex($b);