Is there a standard method to convert a string like "\uFFFF" into character meaning that the string of six character contains a presentation of one unicode character?
+1
A:
char c = "\uFFFF".toCharArray()[0];
The value is directly interpreted as the desired string, and the whole sequence is realized as a single character.
Another way, if you are going to hard-code the value:
char c = '\uFFFF';
Note that \uFFFF
doesn't seem to be a proper unicode character, but try with \u041f
for example.
Bozho
2010-01-24 08:12:12
I think he meant for the string literal that has 6 characters, with two backslashes in the source code, like "\\uFFFF".
Yoni
2010-01-24 08:31:28
yes, _after_ formatting the question properly it turns out to be so..
Bozho
2010-01-24 08:47:53
what is wrong with, say, `char c = '\uFFFF';` ?
rsp
2010-01-24 09:38:16
nothing. I don't quite graps the context behind the question actually.
Bozho
2010-01-24 10:15:52
@Bozho that makes two of us :)
Yoni
2010-01-24 10:44:16
I don't know... maybe it's just because I've run across this a few times before but it seemed obvious to me. :) If you are reading certain sort text, or more commonly RDF files (like n-quads) it is quite common to read literal \uFFFF and need to convert it to a real char code.
PSpeed
2010-01-24 10:50:52
Still, some short sample code would have clarified, I suppose.
PSpeed
2010-01-24 10:52:23
@Bozo, only that `char c = Character.valueOf('\uFFFF');` seemed overly complex to me :-)
rsp
2010-01-24 10:53:02
rsp, right, that's why I updated the answer.
Bozho
2010-01-24 10:56:01
so you did grasp the context :)
rsp
2010-01-24 12:27:34
The value \uFFFF "is guaranteed not to be a Unicode character at all" : http://www.unicode.org/charts/PDF/UFFF0.pdf
trashgod
2010-01-24 15:56:40
+5
A:
The backslash is escaped here (so you see two of them but the s String is really only 6 characters long). If you're sure that you have exactly "\u" at the beginning of your string, simply skip them and converter the hexadecimal value:
String s = "\\u20ac";
char c = (char) Integer.parseInt( s.substring(2), 16 );
After that c shall contain the euro symbol as expected.
Webinator
2010-01-24 08:17:31
char c = (char) Integer.parseInt( s.substring(2), 16 ); - looks very much what I meant. \uFFFF is a format of how Unicode is presented in where I read it from (say ASCII file), not a literal. I magined that there could be a more direct method, but this one should be also fine. Thanks to everybody.
Dima
2010-01-24 17:35:44
+1
A:
String charInUnicode = "\\u0041"; // ascii code 65, the letter 'A'
Integer code = Integer.parseInt(charInUnicode.substring(2), 16); // the integer 65 in base 10
char ch = Character.toChars(code)[0]; // the letter 'A'
Yoni
2010-01-24 08:32:46
Why do you use toChars() when you hard-code `[0]` anyway? Your code goes half-way to supporting high unicode codepoints but misses the other half. What's the point?
Joachim Sauer
2010-01-24 10:54:38
+2
A:
If you are parsing input with Java style escaped characters you might want to have a look at StringEscapeUtils.unescapeJava. It handles Unicode escapes as well as newlines, tabs etc.
String s = StringEscapeUtils.unescapeJava("\\u20ac\\n"); // s contains the euro symbol followed by newline
stoivane
2010-02-06 21:53:25