tags:

views:

263

answers:

2

Say I wanted to print a ÿ (latin small y with diaeresis) from its Unicode/UTF-8 number of U+00FF or hex of c3 bf. How can I do that in PHP?

The reason is that I need to be able to create certain UTF-8 Characters is for testing in my regex and string functions. However, since I have less than 200 keys on my keyboard I can't type them - and since many times I am stuck in an ASCII only world - I need to be able to create them bases solely off of their ASCII safe, UTF-8 character code.

Note: In order for it show correctly in a browser I know that the first step is

header('Content-Type: text/html; charset=utf-8');
+1  A: 

PHP sucks at Unicode. utf8_encode() only converts from ISO-8859-1 to UTF-8. And because your character ÿ = "U+00FF" happens to be one the first 256 codepoints in Unicode (U+0000 to U+00FF) and because Unicode decided that that range was to be coincident with the ISO-8859-1 encoding, you can (in this case!) write a literal ISO-8859-1 string using that hexadecimal number and convert to UTF-8.

<?php
        $x = utf8_encode("\xff");
        print $x;
?>

This works. But, besides sucking badly, this does not apply for Unicode chars not included in ISO-8859-1.

leonbloy
Thanks! Any way to allow me to print any symbol *even those outside U+00FF*?
Xeoncross
mbstring and iconv are supposed to be the answer...
leonbloy
That is the answer to displaying and converting charsets - but I don't know how that could be the answer to just printing a random UTF-8 character if all you know is the symbol number.
Xeoncross
Neither do I, actually... It seems PHP sucks even more badly than I thought :-) Look for a unicode2utf8 function, eg http://www.phpfreaks.com/forums/index.php?topic=263568.0
leonbloy
Man that function is ugly, *but it works great!* Maybe someone else will have a more elegant way of handling this...
Xeoncross
I found this http://hsivonen.iki.fi/php-utf8/
Xeoncross
sucks or not sucks but to enter 2 hex values into string is not a big deal
Col. Shrapnel
The big deal is that, given a Unicode codepoint (number), the task of doing the UTF-8 encoding is left to you, you must write it by hand.
leonbloy
+1  A: 

well you have everything you need.
Hex values being recognized in double-quoted strings as well

echo "\xc3\xbf";
Col. Shrapnel
There's half the problem solved. I was not aware of the "\x..." trick. But what about the `U+00FF` number - how can you represent that in PHP - *or can you?*
Xeoncross
I wonder if you can compose the hex value from a decimal value like `print "\x". 191;`...
Xeoncross
@Xeon base conversion is very simple task, can be accomplished by any beginner programmer manually. there is also some function in PHP I believe, as well as in any other language.to recode U+00FF is also possible, and you have the function aready. Or this one http://stackoverflow.com/questions/1140660/how-to-get-uxxxx-to-display-correctly-using-php5Anyway to ask only a half of your problem isn't too good practice.
Col. Shrapnel
I didn't ask half - I asked for both parts. However, I'm not sure there is a valid answer for first type of unicode number conversion in which case your answer is 100% correct.
Xeoncross
@Xeon, yeah, my bad, you asked both. sorry
Col. Shrapnel