views:

51

answers:

3

I am working on building a simple proxy which will log certain requests which are passed through it. The proxy does not need to interfere with the traffic being passed through it (at this point in the project) and so I am trying to do as little parsing of the raw request/response as possible durring the process (the request and response are pushed off to a queue to be logged outside of the proxy).

My sample works fine, except for a cannot reliably tell when the "response" is complete so I have connections left open for longer than needed. The relevant code is below:

var request = getRequest(url);
byte[] buffer;
int bytesRead = 1;
var dataSent = false;
var timeoutTicks = DateTime.Now.AddMinutes(1).Ticks;

Console.WriteLine("   Sending data to address: {0}", url);
Console.WriteLine("   Waiting for response from host...");
using (var outboundStream = request.GetStream()) {
   while (request.Connected && (DateTime.Now.Ticks < timeoutTicks)) {
      while (outboundStream.DataAvailable) {
         dataSent = true;
         buffer = new byte[OUTPUT_BUFFER_SIZE];
         bytesRead = outboundStream.Read(buffer, 0, OUTPUT_BUFFER_SIZE);

         if (bytesRead > 0) { _clientSocket.Send(buffer, bytesRead, SocketFlags.None); }

         Console.WriteLine("   pushed {0} bytes to requesting host...", _backBuffer.Length);
      }

      if (request.Connected) { Thread.Sleep(0); }
   }
}

Console.WriteLine("   Finished with response from host...");
Console.WriteLine("   Disconnecting socket");
_clientSocket.Shutdown(SocketShutdown.Both);

My question is whether there is an easy way to tell that the response is complete without parsing headers. Given that this response could be anything (encoded, encrypted, gzip'ed etc), I dont want to have to decode the actual response to get the length and determine if I can disconnect my socket.

A: 

Using blocking IO and multiple threads might be your answer. Specifically

using(var response = request.GetResponse())
using(var stream = response.GetResponseStream())
using(var reader = new StreamReader(stream)
  data = reader.ReadToEnd()

This is for textual data, however binary handling is similar.

sukru
I do not know the size of the incoming data, and I will have a lot of clients (upwards of thousands at a time) so I do not want to block on the response completely, or hold the whole response in memory until it completes.
GrayWizardx
There are situations where you might never know the actual data size. Furthermore even the server might not have the information (e.g.: it's being streamed from a CGI script). Thus there is no "one size fits all" solution to your problem. You either have to implement some sort of timeout / limit mechanism, or you'd have to wait for every request to completely finish (or timeout by the system).
sukru
+2  A: 

If you make a HTTP/1.0 request instead of 1.1, the server should close the connection as soon as it's through since it doesn't need to keep the connection open for another request.

Other than that, you really need to parse the content length header in the response to get the best value.

David
I am passing the request through directly so I have no control over the type of request being made. I am just silently copying the data to an offline queue for analysis later. It would be a good option if I had that level of control.
GrayWizardx
You have the data available to change the request to use HTTP/1.0, you just have to be able to dynamically modify the user's request. It would probably be easier to just look for the request content-length.
David
+1  A: 

As David pointed out, connections should remain open for a period of time. You should not close connections unless the client side does that (or if the keep alive interval expires).

Changing to HTTP/1.0 will not work since you are a server and it's the client that will specify HTTP/1.1 in the request. Sure, you can send a error message with HTTP/1.0 as version and hope that the client changes to 1.0, but it seems inefficient.

HTTP messages looks like this:

REQUEST LINE
HEADERS
(empty line)
BODY

The only way to know when a response is done is to search for the Content-Length header. Simply search for "Content-Length:" in the request buffer and extract everything to the linefeed. (But trim the found value before converting to int).

The other alternative is to use the parser in my webserver to get all headers. It should be quite easy to use just the parser and nothing more from the library.

jgauffin
I know the format of the HTTP message, I was trying to avoid having to search it at all, also as per the RFC for HTTP Content-Length need only be specified if its *known* ahead of time, and if there is not a Transfer-length header (I believe I read that right), and either way they specify the length of the body *before* encoding. I will take a look at your code either way. Thanks for the reference.
GrayWizardx
In HTTP/1.0, sure content-length do not need to be specified (connection is closed when the body is transferred). But in HTTP/1.1 it is required, since the connection can remain open (for other requests). There are one exception, and it's when transfer-encoding is chunked. But then each body part have it's own length that you need to parse.
jgauffin