ansaurus

Question

Convert from Codepage 1252 (Windows) to Java, in Java

Answer 1

A:

"windows-1252"/"Cp1252" is not required to be supported by JREs, but is by Sun's (and presumably most others). See the "Supported Encodings" in your JDK documentation. Then it's just a matter of using String, InputStreamReader or similar to decode the bytes into chars.

Tom Hawtin - tackline 2009-02-23 14:55:35

ISO-88591-1 is quite passable as Windows codepage 1252

Thorbjørn Ravn Andersen 2009-11-19 20:44:16

Answer 2

+1 A:

When Java parses a file it uses some encoding to read the bytes on the disk and create bytes in memory. The default encoding varies from platform to platform. Java's internal String representation is Unicode already, so if it parses the file with the right encoding then you are already done; just write out the data in any encoding you want.

If your strings appear corrupted when you look at them in Java, it is probably because you are using the wrong encoding to read the data. Excel is probably using UTF-16 (Little-Endian I think) but I'd expect a library like JXL should be able to detect it appropriately. I've looked at the Javadocs for JXL and it doesn't do anything with character encodings. I imagine it auto-detects any encodings as it needs to.

Do you just need to write the already loaded strings to a text file? If so, then something like the following will work:

String text = getCP1252Text(); // doesn't matter what the original encoding was, Java always uses Unicode
FileOutputStream fos = new FileOutputStream("test.txt"); // Open file
OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-16"); // Specify character encoding
PrintWriter pw = new PrintWriter(osw);

pw.print(text ); // repeat as needed

pw.close(); // cleanup
osw.close();
fos.close();

If your problem is something else please edit your question and provide more details.

Mr. Shiny and New 2009-02-23 15:04:33

Answer 3

A:

FileInputStream fis = new FileInputStream (yourFile);
BufferedReader reader = new BufferedReader(new InputStreamReader(fis,"CP1250"));

And do with reader whatever you'd do directly with file.

vartec 2009-02-23 15:14:17

Answer 4

+1 A:

You need to specify the correct encoding when the file is parsed - once you have a Java String based on the wrong encoding, it's too late.

JXL allows you to specify the encoding by passing a WorkbookSettings object to the factory method.

Michael Borgwardt 2009-02-24 10:58:24

Thanks! I will try that and hopefully get back to this topic to let everybody see how it worked.

Jakob Eriksson 2009-02-24 12:12:29

Answer 5

+2 A:

WorkbookSettings ws = new WorkbookSettings();

ws.setEncoding("CP1250");

Worked for me.

2009-04-21 15:02:09

Answer 6

A:

Your description indicates that the encoding is UTF-8 and indeed C3 B6 is the UTF-8 encoding for 'ö'.

Seth 2010-01-07 16:14:29

Answer 7

A:

If none of the answer above solve the problem, the trick might be done like this:

String myOutput = new String (myInput, "UTF-8");

This should decode the incoming string, whatever its format.

lxndr 2010-08-23 15:09:35

ansaurus

tags:

views:

answers:

Convert from Codepage 1252 (Windows) to Java, in Java

related questions