views:

399

answers:

3

Let's say I have this:

char registered = '®';

or an umlaut, or whatever unicode character. How could I get its code?

+8  A: 

Just convert it to int:

char registered = '®';
int code = (int) registered;

In fact there's an implicit conversion from char to int so you don't have to specify it explicitly as I've done above, but I would do so in this case to make it obvious what you're trying to do.

Jon Skeet
will this work for every char?
Geo
Yep!..............
Carl Smotricz
@Geo: Anything in the Basic Multilingual Plane, yes. You can't represent characters above U+FFFF in a single char in Java. But a char is effectively defined as a UTF-16 codepoint.
Jon Skeet
It works for every `char` that represents a Unicode character below `U+FFFF` but not for every Unicode character, since `char` cannot represent all of Unicode. Depending on the source of your `char`, you may need to do something more complex (and really should prepare for it too).
jk
And to convert it to hex, use `Integer#toHexString()`.
BalusC
+6  A: 

A more complete, albeit more verbose, way of doing this would be to use the Character.codePointAt method. This will handle 'high surrogate' characters, that cannot be represented by a single integer within the range that a char can represent.

In the example you've given this is not strictly necessary - if the (Unicode) character can fit inside a single (Java) char (such as the registered local variable) then it must fall within the \u0000 to \uffff range, and you won't need to worry about surrogate pairs. But if you're looking at potentially higher code points, from within a String/char array, then calling this method is wise in order to cover the edge cases.

For example, instead of

String input = ...;
char fifthChar = input.charAt(4);
int codePoint = (int)fifthChar;

use

String input = ...;
int codePoint = Character.codePointAt(input, 4);

Not only is this slightly less code in this instance, but it will handle detection of surrogate pairs for you.

Andrzej Doyle
+1  A: 

dear friend, Jon Skeet said you can find character Decimal codebut it is not character Hex code as it should mention in unicode, so you should represent character codes via HexCode not in Deciaml.

there is an open source tool at http://unicode.codeplex.com that provides complete information about a characer or a sentece.

so it is better to create a parser that give a char as a parameter and return ahexCode as string

public static string GetHexCode(char character)
    {
        return string.Format("{0:X4}", GetDecimal(character));
    }//end

hope it help

Nasser Hadjloo