views:

92

answers:

5
 hibyte  lobyte  makeunicode
 250     65      57345

I got this table, and the hibyte and lobyte are some chinese character which may use big5 or GBK encoding, hibyte is hight byte, and lobyte is low byte.

And I think the unicode might be some encoding in unicode that corresponding to the big5/GBK character with the hibyte and lobyte.

But after i try to display, they display different character, there must be some problem, can some one help me ?

A: 

5 seconds of Googling turns up http://www.chinesecomputing.com/encodings/index.html. Converting big5 or GBK to unicode is just the identity mapping. I'm not sure what you're doing with your bytes, however, as 250*256+65 = 64065, not 57345.

Keith Randall
yeah, they make the encoding themselves...
MemoryLeak
A: 

57345 is 0xE001 in hex, which has no Unicode character defined (see full list here: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt )

But if you do 250*256+65, you'll get 0xFA41, which is

FA41;CJK COMPATIBILITY IDEOGRAPH-FA41;Lo;0;L;654F;;;;N;;;;;

That is, some Asian glyph. May be, that's the way?

Vladimir Dyuzhev
I believe U+E001 is in the private use area.
McDowell
Yeah, that's what I meant.
Vladimir Dyuzhev
+1  A: 

I don't really understand what you want, but from your high byte and low byte, I got it to print a Chinese character:

byte[] bytes = {(byte)250, (byte)65};
String str = new String(bytes, "GBK");
System.out.println(str); // prints: 鶤
System.out.println((int)str.charAt(0)); // prints: 40356

I don't know where your "57345" comes from

newacct
why not byte[] bytes = {(byte)65, (byte)250}; ? ;) byte-order makes all the difference!
Vladimir Dyuzhev
A: 

Similar to newacct's answer but just to show that it prints this char for other chinese encodings too..

 byte[] b = new byte[] {(byte)250,(byte)65};
 String s = new String(b,"GB18030");
 OutputStreamWriter fos = new OutputStreamWriter(new FileOutputStream(new File("c:\\a.html")),"GB18030");
 fos.write(s);
 fos.close();

Prints 鶤

Ryan Fernandes
A: 

the range of first byte (hibyte) of Big5 is 0xA1 ~ 0xF9; while GBK is 0x81 ~ 0xFE.

obviously, it's not encoded with Big5. It may be GBK/GB18030.

but GK18030 is downward compatible with GBK.

Yi Ling