views:

183

answers:

4

I have finalized a small PHP application that can serve many documents. These documents must be cacheable by clients and proxies.

Since proxies can cache my results I must be extra careful because the documents I serve can have different MIMEs types (content negotiation based on $_SERVER['HTTP_ACCEPT']) and different languages (based in this order: $_POST value / $_GET value / URL / PHP session value / $_COOKIE value / $_SERVER['HTTP_ACCEPT_LANGUAGE'] / default script value).

To shortly sum up, a page can be served with many MIME type and many languages with the same URL (question changed: see edit below).

To help cache on proxies I use the "Vary: Accept" header in combination with the ETag header. The ETags is a MD5 of the current language and the last modified timestamp.

I always:

  • Send an Expires header
  • Send a Cache-Control header
  • Send a Last-Modified header
  • Send a Content-Type header
  • Send an ETag header (based on current language and Last-Modified timestamp)
  • Send a Content-Language
  • Send a "Vary: Accept" header if the document is XHTML

Now with my question: is this enough to help cache on proxies and clients? Did I miss a thing/header?

To help you, here’s the HTTP response header for a test page (on my local environment):

"
Date             Wed, 30 Dec 2009 18:56:26 GMT
Server           Apache/2.0.63 (Win32) PHP/5.1.0
X-Powered-By     PHP/5.1.0
Set-Cookie       Tests=697daqbmple2e1daq2dg74ur96; path=/
Expires          Wed, 30 Dec 2009 21:56:26 GMT
Cache-Control    public, max-age=10800
Last-Modified    Mon, 28 Dec 2009 15:11:49 GMT
Etag             "44fa50be4638161a596e4b75d6ab7a94"
Vary             Accept
Content-Language en-us
Content-Length   3043
Keep-Alive       timeout=15, max=100
Connection       Keep-Alive
Content-Type     application/xhtml+xml; charset=UTF-8
"

EDIT: OK I understand that in this case serving a document with many MIMEs and having different languages (that can come from so many sources - see above) is just plain bad design. If you want to do this just use "private" cache (no cache on proxies)... Am I correct?

If each language have it's own URL (but each URL can be served with many MIME still) is my current implementation is OK for a "public" cache (cache on clients + proxies)?

+3  A: 

Since your output also depends on things a proxy cannot know like session data, won't it be easier to send a (non-cachable) redirect to the actual content, which would be fixed for a given URL (with parameters) and therefore much easier to cache. I know this involves an extra round-trip, but it's probably much less error-prone and would also cause less problems with proxies that don't completely understand/support all your header combinations.

Also, I'm guessing that, if you have two clients going through the same proxy but with different language cookies, your current method would return two different ETags for the same URL, which would make the proxy update its copy each time it sees the other client.

Wim
The session/cookie only contain the language and this is sent in the ETag AND Content-Language headers...
AlexV
But how will a proxy know which language to serve to its client?
Wim
I understand more now... How would you implement the non-cachable redirect in PHP?
AlexV
Wim
Thanks for you help. In the case that each language have it's own URL, does my headers are appropriate if pages can be cached by proxies and clients (public)? See edit on original question.
AlexV
If the contents of Accept uniquely defines the MIME type, then I think yes. Is this AJAX/custom client where you can control the Accept header to request a specific document version, or a browser?
Wim
A: 

If you're truly using the same URL, then you need to modify your code to use a GET request so that each document has it's own url, instead of sending the document name/ID as a POST request.

document.php?id=1234

Unless you do this, it's unlikely a proxy will cache the document. Also make sure you don't literally send the full filename in your GET request for security reasons, you don't want to fall victum to simple exploits like:

document.php?file=index.php

If you don't have these documents indexed in a database and you're sending the file filename, make sure to clean off forward and backward slashes, and create a white list of allowable file extensions (black listing isn't a good idea but if you have to).

TravisO
I think I'm trying to solve inexistent problems :) Yes it can be served in many languages on the same URL, but in that case DO NOT USE "public" cache just use "private" cache.
AlexV
A: 

If the response varies both on "Accept" and "Accept-Language", then both need to be mentioned in the "Vary" response header.

Julian Reschke
It's based on the current language which is (cascade) set by (in this order): $_POST value / $_GET value / URL / PHP session value / $_COOKIE value / $_SERVER['HTTP_ACCEPT_LANGUAGE'] / default script value
AlexV
If it varies on data other than in the request headers, you'll need to state "Vary: *", or take it out and set Cache-Control accordingly. Otherwise intermediaries will be confused.
Julian Reschke
A: 

I believe you should be fine in principle -- adding the Vary header means that caches should hold multiple instances of your data, keyed by ETag.

I would note, though, that you don't only vary on Accept, you also vary on Cookie and Accept-Language. Varying by cookie means that the proxy will have to validate every request, but should be able to use an If-None-Match header to let the server indicate which (already cached) ETag should be used.

Andrew Aylett
I think you are right. What's the syntax of the Vary header when you have many conditions like this?
AlexV
Look at http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html section 44. Re-reading your comments, I think you may have to Vary: *, as the language may change inside a session without the cookies (or any other headers or URL components) changing.
Andrew Aylett