views:

96

answers:

1

I've written this little test class to connect up to an FTP server.

import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;

public class FTPTest {

    public static void main(String[] args) {
        URL url = null;

        try {
            url = new URL("ftp://anonymous:[email protected]");
        } catch (MalformedURLException e) {
            e.printStackTrace();
        }

        URLConnection conn = null;

        try {
            conn = url.openConnection();
        } catch (IOException e) {
            e.printStackTrace();
        }

        InputStream in = null;

        try {
            in = conn.getInputStream();
        } catch (IOException e) {
            e.printStackTrace();
        }

        BufferedInputStream bin = new BufferedInputStream(in);
        int b;

        try {
            while ((b = bin.read()) != -1) {
                char c = (char) b;
                System.out.print("" + (char) b);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Here's the output:

-rw-r--r-- 1 ftp ftp           4700 Apr 30  2007 premier.java
-rw-r--r-- 1 ftp ftp          88576 Oct 23  2007 Serie1_1.doc
-rw-r--r-- 1 ftp ftp           1401 Nov 21  2006 tp20061121.txt
drwxr-xr-x 1 ftp ftp              0 Apr 23 20:04 répertoire

Notice the name of the directory at the end of the list. There should be an "é" (e with acute accent) instead of the double character "é".

This reminds me of an issue encountered previously with JSF where there was a mix-up between standards. I have little experience with character-encoding though so I'm not sure what's happening. I'm supposing that the server output is in ASCII so how do I adapt the output so it appears correctly in the console?

+2  A: 

You're brute-force converting bytes from the input stream into chars using

char c = (char) b;

This is definitely not the Good Housekeeping approved form.

Streams deliver bytes, and you want chars. Readers deliver chars and will do character set translation for you in an automatic and controlled way.

You should wrap an InputStreamReader around the InputStream. The constructor for InputStreamReader allows you to specify a CharSet, which will let you control the translation.

Reading from the InputStreamReader will of course yield "real" chars. Another benefit is that you can wrap a BufferedReader around the InputStreamReader and then read entire lines at a time (into a String) using readLine.


EDIT: To illustrate what I mean by "wrap around," here's some (untested!) coding to illustrate the idea:

BufferedReader br = new BufferedReader(new InputStreamReader(bin, "US-ASCII"));
...
String line = br.readLine();
Carl Smotricz
So I am. So this would mean that chars in Java are in Unicode. Thanks for the tip on combining different input classes. That actually answers another question I had.
James P.
Yes, Java chars are indeed Unicode. When you convert bytes to chars, you're essentially treating those bytes as ASCII characters. I sort of apologize for the grave clunkiness of Java's IO. It's nice that there are different classes available for so many purposes, but some people wonder if things need to be as complex as they are.
Carl Smotricz