views:

654

answers:

2

Hi All,

I'm writing an HTTP client using the .Net TcpClient / Sockets.

So far, the client handles both Content-Length and chunked responses by iterating through the NetworkStream response (after writing a GET request to the TcpClient), parsing the headers and retrieving the relevant message body bytes / chunked bytes. To do this it uses the NetworkStream ReadByte method.

This all works fine, but performance is a key consideration of the application so I would like to make it as quick and efficient as possible.

Initially this will involve swapping ReadByte for Read for the message body (based on Content-Length) or chunked message body byte retrieval into an appropriately sized buffer, using ReadByte in all other areas (such as reading the Headers, Chunk sizes etc).

I'm interested to know thoughts on better / different ways to do this to achieve optimum performance? Obviously the main problem with HTTP is not knowing the length of the response stream unless it is parsed as it is retrieved.

There a specific reasons why I'm not using more abstract classes (eg HttpWebRequest) for this (I need better control at the socket level).

Many Thanks,

Chris

A: 

I suggest using a process with a medium sized buffer. Repeatedly fill the buffer until the response stream ends. When the buffer is full, or the stream ends, attach that buffer content onto the string (or whatever you're using to store the message).

If you want to read an important bit of information early in the stream, read just enough of the stream to see that. (In other words, you don't need to fill the buffer on the first pass if you don't want to.)

You should also consider using an event system to signal the presence of new data, which has been shaped in such a way that the main part of your process doesn't need to know anything about where the data came from or how you are buffering it.

Edit

In response to your comment question, if you have one connection that you are trying to reuse for multiple requests, you would create a thread that reads from it over and over. When it finds data, it uses the event to push it out for the main part of your program to handle. I don't have a sample handy, but you should be able to find several with a few bing or google searches.

John Fisher
Thanks John. If I use a buffer however, I presume I need to use Read() which will block when the end of the stream is exceeded. I need to move on straight away when the last byte is read to doing further parsing of the response etc?Also, do you have any examples / links to a similar event based system?Thanks!
Chris
You can prevent the blocking if you send "Connection: close" as part of the request.
Matthew Whited
Unfortunately re-using connections / keep alives are also important...
Chris
Sorry, I'm still not clear on this.If I create a buffer from the start and use Read() to read the stream into this, there is the possibility that the receive stream is smaller than the buffer length eg:myStream.Read(myReadBuffer, 0, myReadBuffer.Length);and it will therefore block until timeout / the connection is closed. I don't want this as want to process the response data asap....
Chris
....Could you please confirm what the best way to read an HTTP response is in order to read to the end of the stream as fast as possible?For responses containing a "Content-Length" header will this be ReadByte() in order get to the end of the header then Read in a byte array the size of the Content-Length?And for chunked responses, the same but then ReadByte to get the chunk size and Read into a byte[] array the size of the relevant chunk?
Chris
You can call Read, passing a number smaller than myReadBuffer.Length. This will let you read some minimum number of bytes, that should let you find the length of the document before reading in the rest.
John Fisher
Chris, it sounds like you want to read very small chunks until you find the content-length. Once you've found it, then read it and continue filling your buffer with .Read(buffer, alreadyReadBytes, buffer.Length - alreadyReadBytes)
John Fisher
Great - thanks John!
Chris
A: 

BTW I am curious - you mentioned you need fine grained control over the Socket/Connection, that is why you are not using HttpWebRequest.

can you tell us those reasons?

feroze
I want to simulate a number (lets say 200) web clients and to do this I want each client running as a thread which manages its own connection, keeping it open for the duration of a number of different requests, pausing etc etc. Each client will be hitting the same end point (very similar functionality to standard load test tools).I don't believe HttpWebRequest can achieve this as connections are handled via the service point which stops this level of granular control?
Chris
Sorry, I should also say that I need to capture information such as time to first byte, number of re-connects etc...
Chris
Chris, you can achieve what you want with HttpWebRequest. If you want each thread to have it's own connection, then set the ConnectionGroupName of the request object to something unique. Then it's connection wont be shared with other requests. You can close the connection by calling "CloseConnectionGroup" on the connectionGroup used by the request.If I were you, I would first try to use HttpWebRequest and see if it suits my purpose. If it doesnt, then I would roll my own.
feroze