views:

1443

answers:

3

I am writing a simple proxy in Java. I am having trouble reading the entirety of a given request into a byte array. Specifically, in the following loop, the call to 'read' blocks even though the client has sent all the data that it will (that is, the end of stream is never reached). As I can't be sure that it is time to start writing output until I've read the entirety of the input, this is causing a bit of trouble. If I kill the connection to the server, the end of stream is finally reached, and everything goes off without a hitch (all of the data from the client, in this case Firefox requesting www.google.com, has been read by the server, and it is able to process it as required, though obviously it can't send anything back to the client).

public static void copyStream(InputStream is, OutputStream os) throws IOException
{
    int read = 0;
    byte[] buffer = new byte[BUFFER_SIZE];
    while((read = is.read(buffer, 0, BUFFER_SIZE)) != -1)
    {
      os.write(buffer, 0, read);
    }
    return;
}

The InputStream comes from the client socket (getInputStream(), then buffered) directly; the OutputStream is a ByteArrayOutputStream.

What am I doing wrong?

+4  A: 

Typically in HTTP the Content-Length header indicates how much data you're supposed to read from the stream. Basically it tells you how many bytes follow the double-newline (actually double-\r\n) that indicates the end of the HTTP headers. See W3C for more info...

If there is no Content-Length header sent, you could try interrupting the read after a certain amount of time passes with no data sent over the connection, although that's definitely not preferable.

(I'm assuming that you're going to be processing the data you're reading somehow, otherwise you could just write out each byte as you read it)

David Zaslavsky
Exactly right. At the socket level, unless one party explicitly closes the connection, the stream will never end, since it can be used repeatedly (See: Connection: Keep-Alive).
Allain Lalonde
Alas, this is what I suspected...Sure makes what I was doing uglier; it would be nice if the first thing in the request was the size.
Zach Snow
Wow, nice catch. Your method SHOULD work for almost everything EXCEPT http connections, i'd suspect. There's tons of file/stream copying code that looks exactly like what you have there.
John Gardner
+1  A: 

HTTP 1.1, supported by all modern browsers, has a feature called "keep-alive", or "persistent connections", in which clients are allowed by default to reuse a HTTP 1.1 connection to a server for several requests (see http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html). So if you are pointing FF to http://www.google.com, the connection to www.google.com:80 will remain open for a while, even if the first request has been completed. You thus can not know if all the data has been sent without a basic understanding of HTTP protocol by your application. You can somehow circumvent that by using a timeout on the connection, hoping the client is not stuck somewhere and that silence actually means the end of the data block. An other way would be to rewrite server response headers, to advertise your proxy as HTTP 1.0 compliant, and not 1.1, thus forbidding the client to use persistent connections.

Varkhan
+1  A: 

Keep in mind that not all connections will have a Content-Length header; some may be using Transfer-Encoding: chunked where the content length is encoded and included as part of the body.

Phil M