views:

75

answers:

3

If a web server is going to serve out say 100GB per day would it be better for it to do so in 10,000 10MB sessions or 200,000 500kB sessions.

The reason for this question is I'm wondering if there would be any advantage, disadvantage or neither to sites that mirror content to allow clients to exploit HTTP's start-in-the-middle feature to download files in segments from many servers. (IIRC this is a little like bit torrent works)

A: 

There's usually some overhead associated with each session - the cost of creating a TCP connection for example, or HTTP headers (if you haven't already included them in the 100GB) - so ostensibly it'd be better to use larger sessions. But also think about it from the clients' point of view: by using multiple smaller sessions they can run downloads in parallel and possibly get their content faster. So the actual "best" setup will probably be some intermediate session size, which of course depends on what kind of content you're serving (large files or small ones, streaming or static) and how important you think speed is. There's no one-size-fits-all answer.

David Zaslavsky
so it won't help the server but it might help the business.
BCS
+2  A: 

I suppose that not only the size of the session matters, but the duration of it is very important. I would guess that 200,000 small sessions will live shorter than 10,000 big ones. As soon as a session is completed, resources are freed to be used again. I would optimize it so that a session lasts as short as possible.

Have also in mind that a single server can't have more than a couple of hundred of sessions at the same time (100 - 200 is a safe number).

kgiannakakis
Numbers sound right - a broadband client can download at roughly 1 MByte/sec, so with dual GBit NICs you could probably get 150 MByte/sec.
MSalters
A: 

100 GB/day is not a number you need to worry about. It's an average of slightly over 1 MBYte/sec. You could get that over old Ethernet, to put that in perspective. Now, your actual peak throughput will be a lot higher, perhaps 10 MByte/s. Still, this is no problem at all for a modern server. So, I don't think you need those multiple servers, so that's no reason for splitting the 10MB downloads. And if you're downloading from a single server, why would you insert 19 disconnect-and-reconnect sequences in the middle of a 10MB download?

MSalters
I just picked number out of the air. Scale it up till it's a concern.
BCS
Well - such things don't scale trivially. If your users are geographically dispersed, you scale up by putting servers all around the world. Each customer would download from a local server, but still from only one. "Start in the middle" is then good for failover, but not the normal case.
MSalters