ansaurus

Question

How can I get a Unicode character's code?

Answer 1

+8 A:

Just convert it to int:

char registered = '®';
int code = (int) registered;

In fact there's an implicit conversion from char to int so you don't have to specify it explicitly as I've done above, but I would do so in this case to make it obvious what you're trying to do.

Jon Skeet 2010-01-05 14:20:58

will this work for every char?

Geo 2010-01-05 14:22:55

Yep!..............

Carl Smotricz 2010-01-05 14:23:25

@Geo: Anything in the Basic Multilingual Plane, yes. You can't represent characters above U+FFFF in a single char in Java. But a char is effectively defined as a UTF-16 codepoint.

Jon Skeet 2010-01-05 14:26:49

It works for every `char` that represents a Unicode character below `U+FFFF` but not for every Unicode character, since `char` cannot represent all of Unicode. Depending on the source of your `char`, you may need to do something more complex (and really should prepare for it too).

jk 2010-01-05 14:36:56

And to convert it to hex, use `Integer#toHexString()`.

BalusC 2010-01-06 13:41:46

Answer 2

+6 A:

A more complete, albeit more verbose, way of doing this would be to use the Character.codePointAt method. This will handle 'high surrogate' characters, that cannot be represented by a single integer within the range that a char can represent.

In the example you've given this is not strictly necessary - if the (Unicode) character can fit inside a single (Java) char (such as the registered local variable) then it must fall within the \u0000 to \uffff range, and you won't need to worry about surrogate pairs. But if you're looking at potentially higher code points, from within a String/char array, then calling this method is wise in order to cover the edge cases.

For example, instead of

String input = ...;
char fifthChar = input.charAt(4);
int codePoint = (int)fifthChar;

use

String input = ...;
int codePoint = Character.codePointAt(input, 4);

Not only is this slightly less code in this instance, but it will handle detection of surrogate pairs for you.

Andrzej Doyle 2010-01-05 14:25:23

Answer 3

+1 A:

dear friend, Jon Skeet said you can find character Decimal codebut it is not character Hex code as it should mention in unicode, so you should represent character codes via HexCode not in Deciaml.

there is an open source tool at http://unicode.codeplex.com that provides complete information about a characer or a sentece.

so it is better to create a parser that give a char as a parameter and return ahexCode as string

public static string GetHexCode(char character)
    {
        return string.Format("{0:X4}", GetDecimal(character));
    }//end

hope it help

Nasser Hadjloo 2010-01-06 13:39:59

ansaurus

tags:

views:

answers:

How can I get a Unicode character's code?

related questions