+5  A: 

Please correct me this summary is incorrect:

  • Your multi-threaded client will start a thread that connects to the server and issues just one HTTP GET then that thread closes.
  • When you say 1, 2, 5, 10, 50 threads, you're just referring to how many concurrent threads you allow, each thread itself only handles one request
  • Your client takes between 2 and 5 minutes to download over 1000 images
  • Firefox and Opera will download an equivalent data set in 40 seconds

I suggest that the server rate-limits http connections, either by the webserver daemon itself, a server-local firewall or most likely dedicated firewall.

You are actually abusing the webservice by not re-using the HTTP Connections for more than one request and that the timeouts you experience are because your SYN FLOOD is being clamped.

Firefox and Opera are probably using between 4 and 8 connections to download all of the files.

If you redesign your code to re-use the connections you should achieve similar performance.

MattH
@MattH: +1 In general, I suspect that you're correct about the timeouts being do to a firewall perceiving a `SYN FLOOD`. However, I would disagree with your point about abuse for two reasons. The `HTTP/1.1` standard **does not** require clients to support persistent connections. Clients have the option of using the `Connection: close` header. Second, the site lists all 1300+ thumbnails on a single web page. If you're not prepared to serve that many images to a single client over non-persistent connections then break it up into multiple pages. I will likely try persistent connections.
Robert S. Barnes
@MattH: There is a thread pool of N threads, each thread has exactly one TCP connection open at a time for N concurrent TCP connections. Pipelining is not currently used, so each connection is used to download one file.
Robert S. Barnes
@Robert: You don't have to be malicious in order to abuse something. You don't have to be a network analyst or a server administrator in order to put up a webpage with a thousand images on it. HTTP 1.1 spec says that implementations `SHOULD` implement persistence. Opening a thousand connections per minute to a webserver is inefficient (as you've discovered while asking this question) and still IMO abusive. Your assertion that "hosting page with 1300 images" implies "the server MUST support 1300 simultaneous connection attempts from a single host" won't hold with any server administrators.
MattH
@MattH; You make a reasonable point. I'm still trying to understand more specifically what's happening, and right now I'm actually suspecting that due to the small file sizes and the connection setup overhead that I'm suffering from tinygrams eating my bandwidth.
Robert S. Barnes
@MattH: You where most correct on your answer. Take a look at my summary of results if you're interested.
Robert S. Barnes