ansaurus

Question

Character corruption going from BufferedReader to BufferedWriter in java

Answer 1

+5 A:

The file read is not in the same encoding (probably UTF-8) as the file written (probably ISO-8859-1).

Try the following to generate a file with UTF-8 encoding:

BufferedWriter output = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile),"UTF8"));

Unfortunately, determining the encoding of a file is very difficult. See http://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream

Thierry-Dimitri Roy 2010-08-24 17:54:02

And as far as I know there isn't really an automated way to obtain the encoding of a text file.

extraneon 2010-08-24 17:57:19

`Reader` end of things too.

Tom Hawtin - tackline 2010-08-24 18:53:57

"UTF8" and 16 don't seem to work even though <meta http-equiv="content-type" content="text/html;charset=utf-8" /> is explicitely stated in the HTML...Does anybdoy know how to look up encoding by going from a know character in a file to an encoding?

Misha 2010-08-24 19:46:48

I tried US-ASCII ISO-8859-1UTF-8UTF-16BEUTF-16LE UTF-16And they don't work...

Misha 2010-08-24 21:14:53

The character's decimal value is "8221", it should be Unicode right?

Misha 2010-08-24 21:21:56

Problem was solved by changing to UTF-8 AND parsing the entire file and replacing all special above 126 characters to "xx" format.

Misha 2010-08-25 02:19:09

Answer 2

A:

In addition to what Thierry-Dimitri Roy wrote, if you know the encoding you have to create your FileReader with a bit of extra work. From the docs:

Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.

extraneon 2010-08-24 18:00:20

Answer 3

A:

The Javadoc for FileReader says:

The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.

In your case the default character encoding is probably not appropriate. Find what encoding the input file uses, and specify it. For example:

FileInputStream fis = new FileInputStream(myFile);
InputStreamReader isr = new InputStreamReader(fis, "charset name goes here");
BufferedReader input = new BufferedReader(isr);

Richard Fearn 2010-08-24 18:00:48

ansaurus

tags:

views:

answers:

Character corruption going from BufferedReader to BufferedWriter in java

related questions