tags:

views:

140

answers:

2

I've got a bunch of unicode characters from U1F000 and upwards, and I'm wondering how to represent them in Java. A Java unicode escape is on the form "\uXXXX" and the Java language specification says that "Representing supplementary characters requires two consecutive Unicode escapes". How does that apply to U1F000?

String mahjongTile = "\u0001\uf000";

Does not seem to work (I only get two blank squares), but that may be a font-glitch, I presume.

A: 

You'd need to work out the appropriate surrogate pair if you want it in a string literal. (In C# you could write "\U0001f000" - \u is used for the BMP, and \U for full Unicode.)

In Java you could do:

String foo = new String(new int[]{0x1f000}, 0, 1);

if you wanted to still see the "1f000"-ness of it. I confess I can't remember the high/low surrogate ranges off the top of my head :(

Jon Skeet
Hm. I still get two blank squares. I suppose I'll need a special font to get the Mahjong tiles to render correctly.
JesperE
What font are you trying to display them in, and what kind of UI control?
Jon Skeet
surrogate range is D800-DFFF
MSalters
+2  A: 

Jon's answer should work, but you can also use the appendCodePoint method in StringBuilder or StringBuffer.

StringBuilder sb = new StringBuilder();
sb.appendCodePoint(0x1f000);

Both techniques do the conversion to surrogate pairs for you.

It sounds like your problem now is getting the characters to display properly. If you're trying to display them on the console, forget it; the console on most machines is way too limited. I suggest you either write your output to a file and use a good text editor to read it, or display the output in a Swing component like a JTextPane.

Alan Moore
I'm currently outputting it to a SWT list-view, I think (I'm using a code skeleton created by Eclipse, and I don't have the code right here to check.)
JesperE