views:

948

answers:

4

Is there a standard method to convert a string like "\uFFFF" into character meaning that the string of six character contains a presentation of one unicode character?

+1  A: 
char c = "\uFFFF".toCharArray()[0];

The value is directly interpreted as the desired string, and the whole sequence is realized as a single character.

Another way, if you are going to hard-code the value:

char c = '\uFFFF';

Note that \uFFFF doesn't seem to be a proper unicode character, but try with \u041f for example.

Read about unicode escapes here

Bozho
I think he meant for the string literal that has 6 characters, with two backslashes in the source code, like "\\uFFFF".
Yoni
yes, _after_ formatting the question properly it turns out to be so..
Bozho
what is wrong with, say, `char c = '\uFFFF';` ?
rsp
nothing. I don't quite graps the context behind the question actually.
Bozho
@Bozho that makes two of us :)
Yoni
I don't know... maybe it's just because I've run across this a few times before but it seemed obvious to me. :) If you are reading certain sort text, or more commonly RDF files (like n-quads) it is quite common to read literal \uFFFF and need to convert it to a real char code.
PSpeed
Still, some short sample code would have clarified, I suppose.
PSpeed
@Bozo, only that `char c = Character.valueOf('\uFFFF');` seemed overly complex to me :-)
rsp
rsp, right, that's why I updated the answer.
Bozho
so you did grasp the context :)
rsp
The value \uFFFF "is guaranteed not to be a Unicode character at all" : http://www.unicode.org/charts/PDF/UFFF0.pdf
trashgod
+5  A: 

The backslash is escaped here (so you see two of them but the s String is really only 6 characters long). If you're sure that you have exactly "\u" at the beginning of your string, simply skip them and converter the hexadecimal value:

String s = "\\u20ac";

char c = (char) Integer.parseInt( s.substring(2), 16 );

After that c shall contain the euro symbol as expected.

Webinator
This is what I do when I need this.
PSpeed
char c = (char) Integer.parseInt( s.substring(2), 16 ); - looks very much what I meant. \uFFFF is a format of how Unicode is presented in where I read it from (say ASCII file), not a literal. I magined that there could be a more direct method, but this one should be also fine. Thanks to everybody.
Dima
+1  A: 
String charInUnicode = "\\u0041"; // ascii code 65, the letter 'A'
Integer code = Integer.parseInt(charInUnicode.substring(2), 16); // the integer 65 in base 10
char ch = Character.toChars(code)[0]; // the letter 'A'
Yoni
Why do you use toChars() when you hard-code `[0]` anyway? Your code goes half-way to supporting high unicode codepoints but misses the other half. What's the point?
Joachim Sauer
+2  A: 

If you are parsing input with Java style escaped characters you might want to have a look at StringEscapeUtils.unescapeJava. It handles Unicode escapes as well as newlines, tabs etc.

String s = StringEscapeUtils.unescapeJava("\\u20ac\\n"); // s contains the euro symbol followed by newline
stoivane