ansaurus

Question

Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte

Answer 1

A:

Look into the classes in java.nio.charset.

Adam Jaskiewicz 2009-03-17 20:38:40

Answer 2

+5 A:

byte[] iso88591Data = theString.getBytes("ISO-8859-1");

Will do the trick. From your description it seems as if you're trying to "store an ISO-8859-1 String". String objects in Java are always implicitely encoded in UTF-16. There's no way to change that encoding.

What you can do, 'though is to get the bytes that constitute some other encoding of it (using the .getBytes() method as shown above).

Joachim Sauer 2009-03-17 20:39:17

Answer 3

+8 A:

If you're dealing with character encodings other than UTF-16, you shouldn't be using java.lang.String or the char primitive -- you should only be using byte[] arrays or ByteBuffer objects. Then, you can use java.nio.charset.Charset to convert between encodings:

Charset utf8charset = Charset.forName("UTF-8");
Charset iso88591charset = Charset.forName("ISO-8859-1");

ByteBuffer inputBuffer = ByteBuffer.wrap(new Byte[]{(byte)0xC3, (byte)0xA2});

// decode UTF-8
CharBuffer data = utf8charset.decode(inputBuffer);

// encode ISO-8559-1
ByteBuffer outputBuffer = iso88591charset.encode(data);
byte[] outputData = outputBuffer.array();

Adam Rosenfield 2009-03-17 20:43:21

Thanks a lot.. Really helpful - Luckylak

2009-03-18 20:10:57

Answer 4

+1 A:

erickson 2009-03-17 20:48:32

Or it could simply contain characters not representable in latin1.

Adam Jaskiewicz 2009-03-17 20:57:10

'â' is representable in Latin-1.

erickson 2009-03-17 21:48:13

Answer 5

A:

Starting with a set of bytes which encode a string using UTF-8, creates a string from that data, then get some bytes encoding the string in a different encoding:

    byte[] utf8bytes = { (byte)0xc3, (byte)0xa2, 0x61, 0x62, 0x63, 0x64 };
    Charset utf8charset = Charset.forName("UTF-8");
    Charset iso88591charset = Charset.forName("ISO-8859-1");

    String string = new String ( utf8bytes, utf8charset );

    System.out.println(string);

    // "When I do a getbytes(encoding) and "
    byte[] iso88591bytes = string.getBytes(iso88591charset);

    for ( byte b : iso88591bytes )
        System.out.printf("%02x ", b);

    System.out.println();

    // "then create a new string with the bytes in ISO-8859-1 encoding"
    String string2 = new String ( iso88591bytes, iso88591charset );

    // "I get a two different chars"
    System.out.println(string2);

this outputs strings and the iso88591 bytes correctly:

âabcd 
e2 61 62 63 64 
âabcd

So your byte array wasn't paired with the correct encoding:

    String failString = new String ( utf8bytes, iso88591charset );

    System.out.println(failString);

Outputs

Ã¢abcd

(either that, or you just wrote the utf8 bytes to a file and read them elsewhere as iso88591)

Pete Kirkham 2009-03-17 22:25:03

Answer 6

A:

Can anyone help me out please,

A friend needs this Å¬·¡½º ¼±ÅÃ converted to english.

he says that is is written in iso-8559 code but i dont know how to convert it for him.....

can anyone here convert it please

2009-04-05 17:54:30

ask a separate question for this

Michael Donohue 2009-10-11 03:41:35

you are not in the proper stackoverflow, look for buffoonoverflow.com

netadictos 2010-09-10 11:41:08

Answer 7

A:

NEVER MIND,

it means Class Selection in english.

Word is wonderful :)

2009-04-06 03:43:58

Answer 8

A:

evict non ISO-8859-1 characters, will be replace by '?' (before send to a ISO-8859-1 DB by example):

utf8String = new String ( utf8String.getBytes(), "ISO-8859-1" );

bcros 2010-03-30 09:00:23

Answer 9

A:

÷êíeî…¬$•)às9Ó×2;Ï¦^nv¬X˜œÔÎÅ©êÔèí5×•¾lyÿÊMO:PžIÂß]®C‘ÞX™cïU¡†Îô¯&–<‰‚”P\šæxFœÿç,Æ”šNÕ£—P,ªý.§|#„¾âäÐþÞå15êsW¿iïÆŸSV¡À¡Ê~÷d£Jk¥hï"¿Ë T¡)ÞÜÝœÜç æ ÑGì0·=(yW}¦~@ß‚a˜§’×«MÞe‘¥õô5$¹e û¸"™Ù:SÅ5ÿrdÃ…ÙÖå¼:u9®6/õî :š-+Ë@Àÿ§¶!Êê\Q¦ÌŠgj3·Ö·>³¤V:"65%Q#1œª ¦CcR!ãÛ¥¹wéYGÔ1Zq±ƒ"jtÔ*j2G¦ÝìÊ£ÜU¸yÕƒi4†'„ŽMÁÎ2,’#bzTyÕÿÛx§Åí~B+Þˆüt™6zÑ zñÄÌóžJMÄöj«yö£

How do i change that to something i can read

Adam 2010-08-03 05:49:33

Get a mirror. And dont dig out threads from early 09.

atamanroman 2010-08-03 06:59:25

ansaurus

tags:

views:

answers:

Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte

related questions