First, it's really difficult to know what exactly is going wrong here - "Sometimes it will make a good request, sometimes not." doesn't really describe what's happening when the problem occurs!!
That said, I was still able to figure out what's going wrong for you.
As you've said already, you're looking for the most basic solution that'll work consistently, so I'll avoid anything unnecessary or getting into the efficiency or otherwise of your code. Also, I'll give you the answer first and then describe what's causing the problem (it's long, but worth reading :)
Solution
The simple answer to your problem is that you need to do some HTTP protocol parsing to figure out if all of the data has been sent by the client and not rely on what available()
or read()
return. How much of a PITA this is depends on how completely you wish to support the HTTP protocol. To support GET requests, it's pretty easy. It's a little harder to support POSTs that specify a content length. It's much harder to support "other" encoding types (e.g. chunked or multipart/byteranges see http://tools.ietf.org/html/rfc2616#section-4.4).
Anyway, I assume you're just trying to get GETs working, so to do that, you have to know that HTTP headers and bodys are separated by an "empty line", that HTTP's line delimeter is \r\n and that GETs do not have a body. Therefore a client has finished sending a GET request when it transmits \r\n\r\n.
Some code like this should handle GETs consistently for you (code is untested but it should get you to at least 90%):
def readClientData(Socket clientSocket) {
def actualBuffer = new StringBuilder()
def eof = false;
def emptyLine = ['\r', '\n', '\r', '\n']
def lastEmptyLineChar = 0
InputStream inStream = clientSocket.inputStream
while(!eof) {
def available = inStream.available()
println "available data $available"
// try to read all available bytes
def buffer = new byte[available]
def bytesRead = inStream.read(buffer,0,available)
// check for empty line:
// * iterate through the buffer until the first element of emptyLine is found
// * continue iterating through buffer checking subsequent elements of buffer with emptyLine while consecutive elements match
// * if any element in buffer and emptyLine do not match, start looking for the first element of emptyLine again as the iteration through buffer continues
// * if the end of emptyLine is reached and matches with buffer, then the emptyLine has been found
for( int i=0; i < bytesRead && !eof; i++ ) {
if( buffer[i] == emptyLine[lastEmptyLineChar] ){
lastEmptyLineChar++
eof = lastEmptyLineChar >= emptyLine.length()
}
else {
lastEmptyLineChar = 0
}
}
// changed this so that you avoid any encoding issues
actualBuffer << new String(buffer, 0, bytesRead, Charset.forName("US-ASCII"))
}
return actualBuffer.toString()
}
For POSTs, you need to add to this by also looking for the String "Content-length: " and parsing the value after this. This value is the size of the HTTP body (i.e. the bit that comes after the /r/n/r/n end of header mark) in octals. So when you encounter the end of header, you just need to count that number of octals of bytes and you know that the POST request has completed transmission.
You'll also need to determine the type of request (GET, POST etc.) - you can do this by inspecting the characters transmitted before the first space.
Problem
Your problem is that your readClientData
function doesn't always read all of the data sent by the client. As a result, you're sometimes sending a partial request to the server and the returns some kind of error. You should see incomplete requests printed to standard out if you replace
println(new String(buffer))
with
println(avaliable)
in the readClientData
function.
Why is this happening? It's because available() only tells you what's currently available to be read from the InputStream and not whether or not the client has sent all the data it's going to send. An InputStream, by it's very nature, can never actually tell whether or not there will be more data (the exception to this is if there is no more underlying data to read - e.g. a socket is closed, the end of the array or file has been reached, etc. - this is the only time read() will return -1 (i.e. EOF)). Instead it's up to higher level code to decide whether it should read more data from the stream and it makes this decision based on application-specific rules that apply to the application-specific data being read by the InputStream.
In this case, the application is HTTP, so you need to understand the basics of the HTTP protocol before you'll get this working (cmeerw, you were on the right track).
When a HTTP request is made by a client, the client opens a socket to the server and sends a request. The client only closes the socket as a result of a timeout, or the underlying network connection being disconnected, or in response to user action that requires that the socket is closed (application is closed, page is refreshed, stop button pushed etc.). Otherwise, after sending the request, it just waits for the server to send a response. Once the server has sent the response, the server closes the connection [1].
Where your code succeeds, data is being provided by the client quickly and consistently enough so that the InputStream receives additional data between your invocation of read()
and your subsequent invocation of available()
on the next iteration of the loop (remember that InputStream
is being provided with data "in parallel" to your code that's invoking its read()
method). Now in the other case, where your code fails, no data has yet been provided to InputStream
, so when your code invokes available()
, InputStream
correctly returns 0 since no further data has been provided to it since you invoked read()
and therefore it has 0 bytes available for you to read()
. This is the race condition that Johnathan's talking about.
Your code assumes that when available()
returns 0 that all data has been sent by the client when, in fact, sometimes it has, and sometimes it has not (so sometimes you get a "good request" and other times not :).
So you need something better than available()
to determine wheter or not the client has sent all of the data.
Checking for EOF when you invoke read()
(see R4an's answer [2]) isn't suitable either. It should be clear why this is the case - the only time read()
is supposed to return EOF (-1) is when the socket is closed. This isn't supposed to happen until you've forwarded the request to the target proxy, received a response and sent that response to the client, but we know it can also exceptionally be closed by the client. In fact you're seeing this behaviour when you run the sample code - the proxy hangs until the stop button is clicked in the browser, causing the client to close the connection prematurely.
The correct answer, which you now know, is to do some parsing of the HTTP and use that to determine the state of the connection.
Notes
[1] It's beyond a proof of concept proxy, but since it was touched on already, if the HTTP connection is "keep-alive" the server will keep the connection open and wait on another request from the client
[2] There's an error in this code that causes the readClientData mangle the data:
byte[] buffer = new byte[16 * 1024];
while((bytesRead = inStream.read(buffer)) >= 0) { // -1 on EOF
def bytesRead = inStream.read(buffer,0,bytesRead);
actualBuffer << new String(buffer)
}
The second inStream.read()
invocation completely overwrites the data read by the first invocation of inStream.read()
. Also bytesRead is being redefined here (not familiar enough with Groovy to know whether or not this would be an error). This line should either read:
bytesRead = bytesRead + inStream.read(buffer,bytesRead,buffer.length()-bytesRead);
or be removed entirely.