views:

49

answers:

1

For Internet caching, an update heuristic is to hold the document for a time that is proportional to the known lifetime of the object. If we follow a typical 60% rule, and we receive a response as follows:

HTTP/1.0 200 OK
Date: Tue, 23 Jun 2009 09:23:24
Server: Apache/1.3.0 (Unix)
Last-Modified: Mon, 8 Jun 2009 09:23:24 Content-Type: text/html

Until when should we cache this object?

+1  A: 

Technically, you can cache it forever, it has no set expiry. What a user agent should do when it wants to display the cached content is issue another request with an If-Modified-Since header, which allows the server to return a nice, short 304 Not-Modified response.

Another way to look at your question is "I don't want to re-request it every time, what's a good heuristic to trigger those re-requests?". One suggestion would be a frequency based on a Fibonacci sequence, so that recently updated documents re-requested, but as they age, the frequency of re-requests gets less frequent.

The HTTP/1.1 spec leaves this open, Section 3.2.2 "Heuristic Calculations" has this to say

Since origin servers do not always provide explicit expiration times, HTTP caches typically assign heuristic expiration times, employing algorithms that use other header values (such as the Last-Modified time) to estimate a plausible expiration time. The HTTP/1.1 specification does not provide specific algorithms, but does impose worst-case constraints on their results. Since heuristic expiration times might compromise semantic transparency, they ought to used cautiously, and we encourage origin servers to provide explicit expiration times as much as possible.

Paul Dixon