views:

44

answers:

1

I am using sockets to create a POST request to a given server. The response comes back mostly ok, and I'm using an InputStream with an encoding of "UTF-8" to read the response from the server. Most of the response makes sense and I'm able to view the HTML correctly, however, seemingly at random, I see codes such as "1ffa", "6e8", "1972", "90", "0" come up as single lines on the response as I'm reading it in. Here's how I create and read the response.

    String hostname = "server";
    SocketFactory socketFactory = SSLSocketFactory.getDefault();
    Socket socket = new Socket(hostname, 8080);
   // Create streams to securely send and receive data to the server
    InputStream in = socket.getInputStream();
    OutputStream out = socket.getOutputStream();
    PrintWriter writer = new PrintWriter(out);
    writer.println("POST /handlerServlet http/1.1");
    writer.println("Host: " + hostname);
    String parameters="params=" + URLEncoder.encode("paramsToEncode", "UTF-8"); 
    writer.println("Content-Length: " + parameters.length());
    writer.println("Content-Type: application/x-www-form-urlencoded");
    writer.println("Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7");
    writer.println("Keep-Alive: 115");
    writer.println("Connection: keep-alive");
    writer.println("\r\n" + parameters + "\r\n");
    writer.flush();
    // Read from in and write to out...
    String input = "";
    BufferedReader reader = new BufferedReader(new InputStreamReader(in, "UTF-8"));
    StringBuffer result = new StringBuffer();
    boolean startWriting = false;
    FileOutputStream outStream1 = new FileOutputStream(new File("/file1.txt"));
    Writer outWriter = new OutputStreamWriter(outStream1, "UTF-8");

    while ( (input = reader.readLine()) != null) {
    result.append(input);
    outWriter.write(input + "\n");
    result.append('\n');
    }
    System.out.println(result.toString());
    outWriter.close();
    // Close the socket
    in.close();

Does any one have any clue as to why I would see characters like this?

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<base href="http://server:8080/HW/YX+JpCEnNDe5B87CCyFj5KR7z9rqlwRK77aMm/44221331.htm"&gt;

1ffa

<meta http-equiv="Content-Type"  content="text/html; charset=ISO-8859-1">
<title></title>
</head>
<body bgcolor="#ffffff">
<!-- Created by Oracle Reports 21:14 Tue Jun 29 09:14:32 PM, 2010 -->
....
<tr valign=top>
  <td height=10></td>
  <td width=80 colspan=3 align=center><font size=2 face="helvetica">V002A050001</font></td>
  <
1ffa
td></td>

as you can see, having these characters appear in random locations can cause some hecktick behavior on the HTML code.

Thanks.

+5  A: 

Do you get a header in your response that says something like this?

Transfer-Encoding: chunked

In this case, it's most likely due to HTTP Chunked Transfer Encoding. It's normal.

Bruno
...and you should process it differently based on the response header. Since the other side is apparently a `Servlet`, you can also just set the `Content-Length` header beforehand to avoid that it will send the body in chunks. You can use `response.setContentLength()` for that.
BalusC
Indeed, setting the content length is a good workaround. I'd also suggest using an existing HTTP client library (unless there are constraints against that). There are plenty around and they tend to handle this well.
Bruno
Yes, I already commented that on the question :)
BalusC