views:

414

answers:

1

This is all mixed up in my head, and I can't wrap my head around it.

I have an excel file I have to parse, using Java, and translate to XML. Using the jExcel library, I can achieve the parsing, and the application's doing the right work, and putting the right strings in the right place. So for the parsing part, I've got that covered.

Problem comes in when I try to transcode the file to UTF-8.

I assumed that the encoding for the excel file was ISO-8859-1, but I'm not exactly sure if it is. Then, I use this function before adding my string to the xml file.

private static String isoToUtf(String thingie){
        byte[] bytedata = thingie.getBytes() ; // Comes in ISO form, as the character set in the DB is set to ISO

        Charset iso = Charset.forName("ISO-8859-1");
        CharsetDecoder isodecoder = iso.newDecoder();
        ByteBuffer bbuf = ByteBuffer.wrap(bytedata);
        CharBuffer cbuf = isodecoder.decode(bbuf);  // Decode from ISO to UTF-16


        Charset utf8 = Charset.forName("UTF-8");
        CharsetEncoder utf8encoder = utf8.newEncoder();
        ByteBuffer outbuffer = utf8encoder.encode(cbuf);  // Encode from UTF-16 to UTF-8
        return new String(outbuffer.array(), "UTF-8");
    }

Somehow, though, it doesn't work. I still lose some characters to corruption.

Also: I absolutely have to do it this way, it has to be displayed on the intertubes eventually.

The excel file is opened using the java.io.File class.

+1  A: 

To anyone in the same situation as I, it is possible to specify options for the Workbook you will create with the jExcel library.

The following link is where I found my answers.

http://jexcelapi.sourceforge.net/resources/javadocs/2%5F6%5F10/docs/index.html

MrZombie