views:

5230

answers:

2
sock = new Socket("www.google.com", 80);
       out  = new BufferedOutputStream(sock.getOutputStream());
       in   = new BufferedInputStream(sock.getInputStream());

When i try to do printing out of content inside "in" like below

 BufferedInputStream bin = new BufferedInputStream(in);
 int b;
 while ( ( b = bin.read() ) != -1 )
 {

     char c = (char)b;         

     System.err.print(""+(char)b); //This prints out content that is unreadable.
                                   //Isn't it supposed to print out html tag?
 }
+3  A: 

If you want to print the content of a web page, you need to work with the HTTP protocol. You do not have to implement it yourself, the best way is to use existing implementations such as the java API HttpURLConnection or Apache's HttpClient

Here is an example of how to do it with HttpURLConnection:

URL url = new URL("http","www.google.com");
HttpURLConnection urlc = (HttpURLConnection)url.openConnection();
urlc.setAllowUserInteraction( false );
urlc.setDoInput( true );
urlc.setDoOutput( false );
urlc.setUseCaches( true );
urlc.setRequestMethod("GET");
urlc.connect();
// check you have received an status code 200 to indicate ok
// get the encoding from the Content-TYpe header
BufferedReader in = new BufferedReader(new InputStreamReader(urlc.getInputStream()));
String line = null;
while((line = in.readLine) != null) {
  System.out.println(line);
}

// close sockets, handle errors, etc.

As written above, you can save traffic by adding the Accept-Encoding header and check the Content-Encoding header of the response.

Here is an HttpClient Example, taken from here:

   // Create an instance of HttpClient.
    HttpClient client = new HttpClient();

    // Create a method instance.
    GetMethod method = new GetMethod(url);

    // Provide custom retry handler is necessary
    method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, 
         new DefaultHttpMethodRetryHandler(3, false));

    try {
      // Execute the method.
      int statusCode = client.executeMethod(method);

      if (statusCode != HttpStatus.SC_OK) {
        System.err.println("Method failed: " + method.getStatusLine());
      }

      // Read the response body.
      byte[] responseBody = method.getResponseBody();

      // Deal with the response.
      // Use caution: ensure correct character encoding and is not binary data
      System.out.println(new String(responseBody));

    } catch (HttpException e) {
      System.err.println("Fatal protocol violation: " + e.getMessage());
      e.printStackTrace();
    } catch (IOException e) {
      System.err.println("Fatal transport error: " + e.getMessage());
      e.printStackTrace();
    } finally {
      // Release the connection.
      method.releaseConnection();
    }
David Rabinowitz
+1 for the HttpClient in particular. As soon as you want to do anything beyond a simple GET, it's invaluable
Brian Agnew
HttpURLConnection doesn't handle gzipped content. I learned that the hard way.
Jeremybub
+1  A: 

If you what to fetch the content of a webpage, you should take a look at apache httpclient instead of coding this yourself, expect for learning purposes or any other really good reason.

Tim Büthe