views:

934

answers:

5

Can anyone point out the flaw in this code? I'm retrieving some HTML with TcpClient. NetworkStream.Read() never seems to finish when talking to an IIS server. If I go use the Fiddler proxy instead, it works fine, but when talking directly to the target server the .read() loop won't exit until the connection exceptions out with an error like "the remote server has closed the connection".

internal TcpClient Client { get; set; }

/// bunch of other code here...

try
{

NetworkStream ns = Client.GetStream();
StreamWriter sw = new StreamWriter(ns);

sw.Write(request);
sw.Flush();

byte[] buffer = new byte[1024];

int read=0;

try
{
    while ((read = ns.Read(buffer, 0, buffer.Length)) > 0)
    {
        response.AppendFormat("{0}", Encoding.ASCII.GetString(buffer, 0, read));
    }
}
catch //(SocketException se)
{

}
finally
{
    Close();
}

Update

In the debugger, I can see the entire response coming through immediately and being appended to my StringBuilder (response). It just appears that the connection isn't being closed when the server is done sending the response, or my code isn't detecting it.

Conclusion As has been said here, it's best to take advantage of the offerings of the protocol (in the case of HTTP, the Content-Length header) to determine when a transaction is complete. However, I've found that not all pages have content-length set. So, I'm now using a hybrid solution:

  1. For ALL transactions, set the request's Connection header to "close", so that the server is discouraged from keeping the socket open. This improves the chances that the server will close the connection when it is through responding to your request.

  2. If Content-Length is set, use it to determine when a request is complete.

  3. Else, set the NetworkStream's RequestTimeout property to a large, but reasonable, value like 1 second. Then, loop on NetworkStream.Read() until either a) the timeout occurs, or b) you read fewer bytes than you asked for.

Thanks to everyone for their excellent and detailed responses.

+2  A: 

Read the response until you reach a double CRLF. What you now have is the Response headers. Parse the headers to read the Content-Length header which will be the count of bytes left in the response.

Here is a regular expression that can catch the Content-Length header.

David's Updated Regex

Content-Length: (?<1>\d+)\r\n

Content-Length

Note

If the server does not properly set this header I would not use it.

ChaosPandion
+1 for clarity and the regex. Thanks.
David Lively
+1 for picking up on the `content-length` issue at the same time and putting in an example. Seems so often there's a deeper issue behind the question.
Aaronaught
See also: http://en.wikipedia.org/wiki/Chunked_transfer_encodingCheck for this if there is no content length header.
Foole
A: 

I may be wrong, but it looks like your call to Write is writing (under the hood) to the stream ns (via StreamWriter). Later, you're reading from the same stream (ns). I don't quite understand why are you doing this?

Anyway, you may need to use Seek on the stream, to move to the location where you want to start reading. I'd guess that it seeks to the end after writing. But as I said, I'm not really sure if this is a useful answer!

Tomas Petricek
That's how `NetworkStream` works. Attempting to seek on one will always throw a `NotSupportedException`.
Aaronaught
Tomas, NetworkStream is bound to a buffered IP channel. Writing sends data to the server, Reading attempts to read from a receive buffer. .Seek() doesn't make sense in that context.
David Lively
Thanks for the clarification! Glad you got a better answer!
Tomas Petricek
+5  A: 

Contrary to what the documentation for NetworkStream.Read implies, the stream obtained from a TcpClient does not simply return 0 for the number of bytes read when there is no data available - it blocks.

If you look at the documentation for TcpClient, you will see this line:

The TcpClient class provides simple methods for connecting, sending, and receiving stream data over a network in synchronous blocking mode.

Now my guess is that if your Read call is blocking, it's because the server has decided not to send any data back. This is probably because the initial request is not getting through properly.

My first suggestion would be to eliminate the StreamWriter as a possible cause (i.e. buffering/encoding nuances), and write directly to the stream using the NetworkStream.Write method. If that works, make sure that you're using the correct parameters for the StreamWriter.

My second suggestion would be not to depend on the result of a Read call to break the loop. The NetworkStream class has a DataAvailable property that is designed for this. The correct way to write a receive loop is:

NetworkStream netStream = client.GetStream();
int read = 0;
byte[] buffer = new byte[1024];
StringBuilder response = new StringBuilder();
do
{
    read = netStream.Read(buffer, 0, buffer.Length);
    response.Append(Encoding.ASCII.GetString(buffer, 0, read));
}
while (netStream.DataAvailable);
Aaronaught
Again, the request works fine when going through the Fiddler proxy. I can see the entire response coming through and being appended to my StringBuilder (response). It just appears that the connection isn't being closed when the server is done sending the response, or my code isn't detecting it. Argh.
David Lively
@David: See my update, I added an example of how to write the loop using `DataAvailable` instead of simply blocking on every read. If this fails as well, it means that you are not getting any response from the server when you don't go through Fiddler.
Aaronaught
@Aaronaught Even when going direct (not through Fiddler), I *AM* receiving the *ENTIRE* response that I expect (I can see this in the debugger). I just don't get any sort of indication that the transaction is complete once that happens. Also, doesn't DataAvailable just indicate that there is data in the receive buffer? If that's the case, a false value for DataAvailable may not necessarily indicate that the transaction is complete, just that no data is yet available, or that the server is taking its time before sending the next chunk.
David Lively
@David: Please, humour me and try it. Yes, `DataAvailable` means that there is data in the receive buffer, but `Read` is a **blocking call**. Your code is probably working by accident because Fiddler closes the socket prematurely (I've had this issue with Fiddler before) - a real server is *not* obligated to close the socket right away and in fact should not always do this - sometimes the connection needs to remain open. The way your code is written, it will *always* loop forever unless it is interrupted, and you can't control that factor.
Aaronaught
@Aaronaught I agree that DataAvailable will work under light load. My concern is that, when the target server is getting hammered, that DataAvailable will temporarily be false while awaiting a yet-to-be transmitted chunk. If Fiddler is closing the connection prematurely, how do browsers or other applications handle this situation? High server latency could easily make DataAvailable fail.
David Lively
@David: That is exactly the reason why the HTTP protocol has a `content-length` header. All a browser has to do is read enough data to grab that header, then it knows exactly how much more data it needs to read. Chunked works differently but that's way beyond the scope of this question. So if you're trying to use a `TcpClient` with HTTP (why not use a `WebRequest` instead?), then the only way to be sure is to check the content-length. If you have no idea how much data is coming back, you either need to rely on `DataAvailable` or wait for some predetermined timeout.
Aaronaught
@Aaronaught - Bingo
ChaosPandion
@Aaronaught This code needs to be protocol-agnostic. The fact that this particular layer uses HTTP is irrelevant. Also, WebRequest does some freaky stuff with cookies (concatenating multiple set-cookie headers, which effectively breaks when any cookie value has a comma). Sorry for the frustration. If I were sending, say, an image or a ZIP file, how would I know when to stop reading data?!
David Lively
@David, There is **no such thing** as a "protocol-agnostic way" of reading the exact amount of data that is and ever will be available from a simple stream of bytes, unless the stream has a known length (which a `NetworkStream` does not). This logic is always part of the underlying protocol. HTTP uses `content-length` to get around this limitation, and the newer HTTP 1.1 can use chunked encoding, where each chunk has a flag that indicates whether or not there are more chunks. It's one or the other. Welcome to the wonderful world of network programming. ;)
Aaronaught
@David - Many *mini-protocols* I have designed send the size of the data in the first 4-8 bytes. Since you are dealing with IIS this is not an option.
ChaosPandion
@Aaronaught That's making more sense now. I suppose when requesting any item via HTTP the content-length header will be present, and hopefully correct. Answer accepted. Thanks.
David Lively
@ChaosPandion I've done the same with binary protocols, but mostly so I could tell if the socket was prematurely closed. I hadn't actually written myself into a situation where the expected response length wasn't immediately available until now.
David Lively
A: 

Two Suggestions...

  1. Have you tried using the DataAvailable property of NetworkStream? It should return true if there is data to be read from the stream.

    while (ns.DataAvailable)
    {
     //Do stuff here
    }
  1. Another option would be to change the ReadTimeOut to a low value so you don't end up blocking for a long time. It can be done like this:

    ns.ReadTimeOut=100;
thorkia
I'm concerned that when the target IIS server is under heavy load, this could cause me to prematurely close the socket. I think that DataAvailable indicates that there is data in the receive buffer; if it is false, the server may still be rendering data to be sent. Setting a low Timeout could cause the same issue.
David Lively
+1  A: 

Not sure if this is helpful or not but with HTTP 1.1 the underlying connection to the server might not be closed so maybe the stream doesn't get closed either? The idea being that you can reuse the connection to send a new request. I think you have to use the content-length. Alternatively use the WebClient or WebRequest classes instead.

Timbo
Adding the "Connection: close" header fixed this, and it appears to be working for just about everything. Good call.
David Lively