views:

123

answers:

3

The problem with unsigned char. I am reading a PPM image file which has data in ASCII/Extended ASCII.

For a character, eg. '†' , In JAVA, after reading it as char and typecasting into int its value is 8224. In C/C++, after reading it as a unsigned char and typecasting into int its value is 160.

How would i read in JAVA so as to get value 160 ?

The followng C++

unsigned char ch1 ='†';  
char ch2 = '†';  

cout << (int) ch1 << "\n"; // prints 160  
cout << (int) ch2 << "\n"; // prints -96  

In Java,

char ch1 = '^';  
char ch2 = '†';  
System.out.println (" value : " +  (int) ch1); // prints 94  
System.out.println (" value :" +  (byte) ch1); // prints 94  

System.out.println (" value : " +  (int) ch2); // prints 8224  
System.out.println (" value :" +  (byte) ch2); // prints 32 

Following are some exceptions 8224 † 8226 • 8800 ≠ 8482 ™ 8710 ∆ 8211 – 8221 ” 8216 ‘ 9674 ◊ 8260 ⁄ 8249 ‹ 8249 ‹ 8734 ∞ 8747 ∫ 8364 € 8730 √ 8804 ≤

Following are some good ones 94 ^ 102 f 112 p 119 w 126 ~ 196 Ä 122 z 197 Å 197 Å

Any help is appreciated

A: 

IIRC Java uses a 16-bit representation for chars (UNICODE?) and C++ normally doesn't unless you use wchars.

I think you'd be better off trying to get C++ to use the UNICODE characters that Java uses rather than the other way around.

Timo Geusch
Hi Timo,Thank you for the prompt reply.I'm trying to write my app in JAVA. So I need a way to get 160 out of the char † . :(
metalhawk
"UNICODE?" UTF-16 to be more precise.
R. Bemrose
+3  A: 

In C++ you are using "narrow" characters in some specific encoding that happens to define character '†' as 160. In other encodings 160 may mean something else, and character '†' may be missing altogether.

In Java, you are always dealing with Unicode. 8660 = 0x2020 = U+2020 "DAGGER".

To get "160", you need to convert your string to the same encoding you are using with C++. See String.getBytes(charset).

atzz
Thanks atzz, that is great explanation.I'm now trying to get what charset is being used in C++.Thank you ! :)
metalhawk
@ravikumar1: Try US-ASCII. If that doesn't work, try ISO-8859-1.
R. Bemrose
Thank you Bemrose. I wrote a small fn to get the charset. I found a hit for -96 (256-96=160). Thank you all for the support. :) Below is my test fn:
metalhawk
Here it is . public void findCharsets() { Map charSets = Charset.availableCharsets(); Iterator it = charSets.keySet().iterator(); String str = Character.toString('†'); while (it.hasNext()) { try { String csName = (String) it.next(); byte b[] = str.getBytes(Charset.forName(csName)); if (b[0] == -96) { System.out.println("Found: " + csName); } } catch (Exception e) { // do nothing; go to next Charset } } }
metalhawk
This is the output of the programFound: MacRomanFound: x-MacCentralEuropeFound: x-MacCroatianFound: x-MacCyrillicFound: x-MacGreekFound: x-MacRomaniaFound: x-MacTurkishFound: x-MacUkraine
metalhawk
A: 

If you write out the unsigned char 160 in C++ as a single byte, and use InputStream.read() you will get 160. Which character this means depends on the assumed encoding but the value 160 is unchanged.

Peter Lawrey
Thanks Peter, I'm trying to write in JAVA only. I dont have a program in C++ which runs first. Simply, I'm decoding in JAVA only, for which I need 160 for char †
metalhawk