ansaurus

Question

Answer 1

+2 A:

HTTP protocol supports a request type called "If-Modified-Since" which basically allows the caching server to ask the web-server if the item has changed. HTTP protocol also supports "Cache-Control" headers inside of HTTP server responses which tell cache servers what to do with the content (such as never cache this, or assume it expires in 1 day, etc).

Also you mentioned encrypted responses. HTTP cache servers cannot cache SSL because to do so would require them to decrypt the pages as a "man in the middle." Doing so would be technically challenging (decrypt the page, store it, and re-encrypt it for the client) and would also violate the page security causing "invalid certificate" warnings on the client side. It is technically possible to have a cache server do it, but it causes more problems than it solves, and is a bad idea. I doubt any cache servers actually do this type of thing.

SoapBox 2009-01-11 17:00:07

Answer 2

+5 A:

Cache servers are supposed to invalidate the entity referred to by the URI on receipt of a PUT (but as you've noticed, this doesn't cover all cases).

Aside from this you could use cache control headers on your responses to limit or prevent caching, and try to process request headers that ask if the URI has been modified since last fetched.

This is still a really complicated issue and in fact is still being worked on (e.g. see http://www.ietf.org/internet-drafts/draft-ietf-httpbis-p6-cache-05.txt)

Caching within proxies doesn't really apply if the content is encrypted (at least with SSL), so that shouldn't be an issue (still may be an issue on the client though).

frankodwyer 2009-01-11 17:04:45

The original question doesn't mention cache servers, I think it was about browser local cache.

Karl 2009-01-11 17:11:03

No, my original question states, "How does a caching server in between the client and the Farms server know to invalidate its cache of /farms/123 when it sees the PUT?" I meant both cache servers and local caches.

James A. Rosen 2009-01-11 17:18:18

Re: SSL: see my comment about encrypted content over unencrypted channels.

James A. Rosen 2009-01-11 17:28:41

Answer 3

+1 A:

Unfortunately HTTP caching is based on exact URIs, and you can't achieve sensible behaviour in your case without forcing clients to do cache revalidation.

If you've had:

GET /farm/123
POST /farm_update/123

You could use Content-Location header to specify that second request modified the first one. AFAIK you can't do that with multiple URIs and I haven't checked if this works at all in popular clients.

The solution is to make pages expire quickly and handle If-Modified-Since or E-Tag with 304 Not Modified status.

porneL 2009-01-11 17:14:06

Answer 4

A:

In re: SoapBox's answer:

I think If-Modified-Since is the two-stage GET I suggested at the end of my question. It seems like an OK solution where the content is large (i.e. where the cost of doubling the number of requests, and thus the overhead is overcome by the gains of not re-sending content. That isn't true in my example of Farms, since each Farm's information is short.)
It is perfectly reasonable to build a system that sends encrypted content over an unencrypted (HTTP) channel. Imagine the scenario of a Service Oriented Architecture where updates are infrequent and GETs are (a) frequent, (b) need to be extremely fast, and (c) must be encrypted. You would build a server that requires a FROM header (or, equivalently, an API key in the request parameters), and sends back an asymmetrically-encrypted version of the content for the requester. Asymmetric encryption is slow, but if properly cached, beats the combined SSL handshake (asymmetric encryption) and symmetric content encryption. Adding a cache in front of this server would dramatically speed up GETs.
A caching server could reasonably cache HTTPS GETs for a short period of time. My bank might put a cache-control of about 5 minutes on my account home page and recent transactions. I'm not terribly likely to spend a long time on the site, so sessions won't be very long, and I'll probably end up hitting my account's main page several times while I'm looking for that check I recently sent of to SnorgTees.

James A. Rosen 2009-01-11 17:17:09

If-modified-since doesn't increase number of requests.

porneL 2009-01-11 17:30:20

I'm pretty sure it does. If the cache could figure out what entries were current, it wouldn't have to send the If-Modified-Since request. You're right that it doesn't _double_ the number. It's dependent on the ratio of reads to writes.

James A. Rosen 2009-01-11 17:35:42

If-Modified-Since doesn't doubly the requests -- the server just responds with either the resource (if it has changed) or a "Not modified" response, for which the client is supposed to use the version they had already.

Rowland Shaw 2009-01-11 17:47:42

You're both right -- it doesn't double the number. But HTTP §13.2.1 ¶1 says, "HTTP caching works best when caches can entirely avoid making requests to the origin server." That's what I'm aiming for.

James A. Rosen 2009-01-11 17:59:23

As I delve in, I see more and more that HTTP caching was built with the idea of caches reaching back to verify via If-Modified-Since. This seems like a lot of overhead, but it does seem to answer all of my problems.

James A. Rosen 2009-01-11 18:08:34

It's impossible for a caching server to cache https gets since the SSL channel is opaque to the server - actually it doesn't even see these as normal HTTP they are done with the CONNECT method, which essentially punches a socket connection through the proxy.

frankodwyer 2009-01-11 18:44:36

(actually I should add that there are some commercial proxies that can do some ugly spoofing of a CA to get around the SSL certificate warnings, but this is a really horrible solution and requires the proxy to be treated as a trusted CA)

frankodwyer 2009-01-11 18:46:59

@frankodwyer -- I guess I always thought proxies could see the headers on SSL traffic. I'll take hat in hand on #3. Good comments.

James A. Rosen 2009-01-11 19:03:40

My personal opinion is that any banking web application should ***NOT*** cache anything. If it's money related it's critical and if it's a bank it should afford hardware to serve all the uncached requests.

Andrei Rinea 2009-05-10 10:09:54

Answer 5

+1 A:

You can't cache dynamic content (withouth drawbacks), because... it's dynamic.

Karsten 2009-01-11 17:38:50

ansaurus

tags:

views:

answers:

I'm confused about HTTP caching.

1. Single `GET` invalidated by batch update

2. Batch `GET` invalidated by single (or batch) update

3. Single `GET` invalidated by update to a related record

related questions

ansaurus

tags:

views:

answers:

I'm confused about HTTP caching.

1. Single GET invalidated by batch update

2. Batch GET invalidated by single (or batch) update

3. Single GET invalidated by update to a related record

related questions

1. Single `GET` invalidated by batch update

2. Batch `GET` invalidated by single (or batch) update

3. Single `GET` invalidated by update to a related record