views:

2293

answers:

3

Real simple question really. I need to read a Unicode text file in a Java program.

I am used to using plain ASCII text with a BufferedReader FileReader combo which is obviously not working :(

I know that I can read a String in the 'traditional' way using a Buffered Reader and then convert it using something like:

temp = new String(temp.getBytes(), "UTF-16");

But is there a way to wrap the Reader in a 'Converter'?

EDIT: the file starts with FF FE

+2  A: 

Check http://java.sun.com/j2se/1.4.2/docs/api/java/io/InputStreamReader.html

I would read source file with something like: Reader in = new InputStreamReader(new FileInputStream("file"), "UTF-8"));

Macarse
+4  A: 

you wouldn't wrap the Reader, instead you would wrap the stream using an InputStreamReader. You could then wrap that with your BufferedReader that you currently use

BufferedReader in = new BufferedReader(new InputStreamReader(stream, encoding));
objects
+3  A: 

Some notes:

  • the "UTF-16" encoding can read either little- or big-endian encoded files marked with a BOM; see here for a list of Java 6 encodings; it is not explicitly stated what endianness will be used when writing using "UTF-16" - it appears to be big-endian - so you might want to use "UnicodeLittle" when saving the data
  • be careful when using String class encode/decode methods, especially with a marked variable-width encoding like UTF-16 - use them only on whole data
  • as others have said, it is often best to read character data by wrapping your InputStream with an InputStreamReader; you can concatenate your input into a single String using a StringBuilder or similar buffer.
McDowell
Thanks for the link to the encoding types. I found the right one for me.
Roger Wernersson