I am using sockets to create a POST request to a given server. The response comes back mostly ok, and I'm using an InputStream with an encoding of "UTF-8" to read the response from the server. Most of the response makes sense and I'm able to view the HTML correctly, however, seemingly at random, I see codes such as "1ffa", "6e8", "1972", "90", "0" come up as single lines on the response as I'm reading it in. Here's how I create and read the response.
String hostname = "server";
SocketFactory socketFactory = SSLSocketFactory.getDefault();
Socket socket = new Socket(hostname, 8080);
// Create streams to securely send and receive data to the server
InputStream in = socket.getInputStream();
OutputStream out = socket.getOutputStream();
PrintWriter writer = new PrintWriter(out);
writer.println("POST /handlerServlet http/1.1");
writer.println("Host: " + hostname);
String parameters="params=" + URLEncoder.encode("paramsToEncode", "UTF-8");
writer.println("Content-Length: " + parameters.length());
writer.println("Content-Type: application/x-www-form-urlencoded");
writer.println("Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7");
writer.println("Keep-Alive: 115");
writer.println("Connection: keep-alive");
writer.println("\r\n" + parameters + "\r\n");
writer.flush();
// Read from in and write to out...
String input = "";
BufferedReader reader = new BufferedReader(new InputStreamReader(in, "UTF-8"));
StringBuffer result = new StringBuffer();
boolean startWriting = false;
FileOutputStream outStream1 = new FileOutputStream(new File("/file1.txt"));
Writer outWriter = new OutputStreamWriter(outStream1, "UTF-8");
while ( (input = reader.readLine()) != null) {
result.append(input);
outWriter.write(input + "\n");
result.append('\n');
}
System.out.println(result.toString());
outWriter.close();
// Close the socket
in.close();
Does any one have any clue as to why I would see characters like this?
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<base href="http://server:8080/HW/YX+JpCEnNDe5B87CCyFj5KR7z9rqlwRK77aMm/44221331.htm">
1ffa
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title></title>
</head>
<body bgcolor="#ffffff">
<!-- Created by Oracle Reports 21:14 Tue Jun 29 09:14:32 PM, 2010 -->
....
<tr valign=top>
<td height=10></td>
<td width=80 colspan=3 align=center><font size=2 face="helvetica">V002A050001</font></td>
<
1ffa
td></td>
as you can see, having these characters appear in random locations can cause some hecktick behavior on the HTML code.
Thanks.