views:

913

answers:

2

I'm trying to weigh the pros and cons of setting the Content-Length HTTP header versus using chunked encoding to return [possibly] large files from my server. One or the other is needed to be compliant with HTTP 1.1 specs using persistent connections. I see the advantage of the Content-Length header being :

  • Download dialogs can show accurate progress bar
  • Client knows upfront if the file may/may not be too large for them to ingest

The downside is having to calculate the size before you return the object which isn't always practical and could add to server/database utilization. The downside of chunked encoding is the small overhead of adding the chunk size before each chunk and the download progress bar. Any thoughts? Any other HTTP considerations for both methods that I may not have thought of?

+4  A: 

Use Content-Length, definitely. The server utilization from this will be almost nonexistent and the benefit to your users will be large.

For dynamic content, it's also quite simple to add compressed response support (gzip). That requires output buffering, which in turn gives you the content length. (not practical with file downloads or already compressed content (sound,images)).

Consider also adding support for partial content/byte-range serving - that is, capability to restart downloads. See here for a byte-range example (the example is in PHP, but is applicable in any language). You need Content-Length when serving partial content.

Of course, those are not silver bullets: for streaming media, it's pointless to use output buffering or response size; for large files, output buffering doesn't make sense, but Content-Length and byte serving makes a lot of sense (restarting a failed download is possible).

Personally, I serve Content-Length whenever I know it; for file download, checking the filesize is insignificant in terms of resources. Result: user has a determinate progress bar (and dynamic pages download faster thanks to gzip).

Piskvor
I don't see how byte range serving (basically: "resume downloads") is beneficial in this particular case. This namely requires that the content length is known beforehand. You can then just as good set the content length.
BalusC
@BalusC: Content-Length is a **prerequisite** for byte-serving. Typical use case: user is downloading a 10MB file over her WiFi connection, signal drops 7MB into the download. Without resume, she has to download the whole 10MB again, which is quite annoying for her; with resume, there's only 3 MB left to go. Most modern browsers support this.
Piskvor
Yes, I know. Maybe you didn't understood me? I am just telling that I don't see how this is related to the "content-length" v.s. "transfer-encoding:chunked" question. By the way, the OP's post history tells me that his main language is Java, in that case this `FileServlet` example may be more useful: http://balusc.blogspot.com/2009/02/fileservlet-supporting-resume-and.html
BalusC
@BalusC: last sentence of question: "Any other HTTP considerations for both methods that I may not have thought of?" When using Content-Length, it is possible to add this functionality; whereas with Transfer-Encoding: chunked, this is not possible.
Piskvor
Yes, that's true. BTW: GZIP doesn't require output buffering. It's by default sent in chunked encoding. At least, in Java servletcontainers.
BalusC
@BalusC: I didn't know that, thanks for the tip.
Piskvor
+2  A: 

If the content length is known beforehand, then I would certainly prefer it above sending in chunks. If there's means of static files at the local disk file system or in a database, then any self-respected programming language and RDBMS provides ways to get the content length beforehand. You should make use of it.

On the other hand, if the content length is really unpredictable beforehand (e.g. when your intent is to zip several files together and send it as one), then sending it in chunks may be faster than buffering it in server's memory or writing to local disk file system first. But this indeed impacts the user experience negatively because the download progress is unknown. The impatient may then abort the download and move along.

Another benefit of knowing the content length beforehand is the ability to resume downloads. I see in your post history that your main programming language is Java; you can find here an article with more technical background information and a Java Servlet example which does that.

BalusC