views:

7226

answers:

16

I have noticed that some browsers (in particular, Firefox and Opera) are very zealous in using cached copies of .css and .js files, even between browser sessions. This leads to a problem when you update one of these files but the user's browser keeps on using the cached copy.

The question is: what is the most elegant way of forcing the user's browser to reload the file when it has changed?

Ideally the solution would not force the browser to reload the file on every visit to the page. I will post my own solution as an answer, but I am curious if anyone has a better solution and I'll let your votes decide.

Update: After allowing discussion here for a while, I have found John Millikin and da5id's suggestion to be useful. It turns out there is a term for this: auto-versioning. I have posted a new answer below which is a combination of my original solution and John's suggestion.

Another idea which was suggested by SCdF would be to append a bogus query string to the file. (Some Python code to automatically use the timestamp as a bogus query string was submitted by pi.) However, there is some discussion as to whether or not the browser would cache a file with a query string. (Remember, we want the browser to cache the file and use it on future visits. We only want it to fetch the file again when it has changed.) Since it is not clear what happens with a bogus query string, I am not accepting that answer.

+54  A: 

Update: Rewritten to incorporate suggestions from John Millikin and da5id. This solution is written in PHP, but should be easily adapted to other languages.

Update 2: Incorporating comments from Nick Johnson that the original .htaccess regex can cause problems with files like json-1.3.js. Solution is to only rewrite if there are exactly 10 digits at the end. (Because 10 digits covers all timestamps from 9/9/2001 to 11/20/2286.)

First, we use the following rewrite rule in .htaccess:

RewriteEngine on
RewriteRule ^(.*)\.[\d]{10}\.(css|js)$ $1.$2 [L]

Now, we write the following PHP fuction:

/**
 *  Given a file, i.e. /css/base.css, replaces it with a string containing the
 *  file's mtime, i.e. /css/base.1221534296.css.
 *  
 *  @param $file  The file to be loaded.  Must be an absolute path (i.e.
 *                starting with slash).
 */
function auto_version($file)
{
  if(strpos($file, '/') !== 0 || !file_exists($_SERVER['DOCUMENT_ROOT'] . $file))
    return $file;

  $mtime = filemtime($_SERVER['DOCUMENT_ROOT'] . $file);
  return preg_replace('{\\.([^./]+)$}', ".$mtime.\$1", $file);
}

Now, wherever you include your CSS, change it from this:

<link rel="stylesheet" href="/css/base.css" type="text/css" />

To this:

<link rel="stylesheet" href="<?=auto_version('/css/base.css')?>" type="text/css" />

This way, you never have to modify the link tag again, and the user will always see the latest CSS. The browser will be able to cache the CSS file, but when you make any changes to your CSS the browser will see this as a new URL, so it won't use the cached copy.

This can also work with images, favicons, and javascript. Basically anything that is not dynamically generated.

Kip
Thats basicly what is done on the site I work for.
Echo
+7  A: 

You can just put ?foo=1234 at the end of your css / js import, changing 1234 to be whatever you like. Have a look at the SO html source for an example.

The idea there being that the ? parameters are discarded / ignored on the request anyway and you can change that number when you roll out a new version.


Note: There is some argument with regard to exactly how this affects caching. I believe the general gist of it is that GET requests, with or without parameters should be cachable, so the above solution should work.

However, it is down to both the web server to decide if it wants to adhere to that part of the spec and the browser the user uses, as it can just go right ahead and ask for a fresh version anyway.

SCdF
This will *prevent* caching, because requests with GET parameters may not be cached (per the HTTP spec)
John Millikin
Egads you're right, I need to read more carefully. Post updated.
SCdF
Nonsense. The query-string (aka. GET parameters) are part of the URL. They can, and will be cached. This is a good solution.
troelskn
@troelskn: The HTTP 1.1 spec says otherwise (with respect to GET and HEAD requests with query params): caches MUST NOT treat responses to such URIs as fresh unless the server provides an explicit expiration time. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.9
Michael Johnson
Updated the answer. As an aside, reading HTTP specs before 8am is not recommended..
SCdF
I tried the query string type of versioning with all major browsers and they DO cache the file, specs or not. However, I think it's better to use the style.TIMESTAMP.css format without abusing query strings anyway because there's still the possibility that caching proxy software WILL NOT cache the file.
TomA
Worth noting, for whatever reason, that Stackoverflow itself uses the query string method.
jason
+12  A: 

I've heard this called "auto versioning". The most common method is to include the static file's mtime somewhere in the URL, and strip it out using rewrite handlers or URL confs:

See also:

John Millikin
Thanks, I guess this was another case where my idea has been discussed, I just didn't know what it was called so I never found it on Google searches.
Kip
+1  A: 

You can force a "session-wide caching" if you add the session-id as a spureous parameter of the js/css file:

<link rel="stylesheet" src="myStyles.css?ABCDEF12345sessionID" />
<script language="javascript" src="myCode.js?ABCDEF12345sessionID"></script>

If you want a version-wide caching you could add some code to print the file date or similar. If you're using Java you can use a custom-tag to generate the link in an elegant way.

<link rel="stylesheet" src="myStyles.css?20080922_1020" />
<script language="javascript" src="myCode.js?20080922_1120"></script>
helios
+3  A: 

Simple Client-side Technique

In general, caching is good.. So there are a couple of techniques, depending on whether you're fixing the problem for yourself as you develop a website, or whether you're trying to control cache in a production environment.

General visitors to your website won't have the same experience that you're having when you're developing the site. Since the average visitor comes to the site less frequently (maybe only a few times each month, unless you're a Google or hi5 Networks), then they are less likely to have your files in cache, and that may be enough. If you want to force a new version into the browser, you can always add a query string to the request, and bump up the version number when you make major changes:

<script src="/myJavascript.js?version=4"></script>

This will ensure that everyone gets the new file. It works because the browser looks at the URL of the file to determine whether it has a copy in cache. If your server isn't set up to do anything with the query string, it will be ignored, but the name will look like a new file to the browser.

On the other hand, if you're developing a website, you don't want to change the version number every time you save a change to your development version. That would be tedious.

So while you're developing your site, a good trick would be to automatically generate a query string parameter:

<!-- Development version: -->
<script>document.write('<script src="/myJavascript.js?dev=' + Math.floor(Math.random()*100) );</script>

Adding a query string to the request is a good way to version a resource, but for a simple website this may be unnecessary. And remember, caching is a good thing.

It's also worth noting that the browser isn't necessarily stingy about keeping files in cache. Browsers have policies for this sort of thing, and they are usually playing by the rules laid down in the HTTP specification. When a browser makes a request to a server, part of the response is an EXPIRES header.. a date which tells the browser how long it should be kept in cache. The next time the browser comes across a request for the same file, it sees that it has a copy in cache and looks to the EXPIRES date to decide whether it should be used.

So believe it or not, it's actually your server that is making that browser cache so persistent. You could adjust your server settings and change the EXPIRES headers, but the little technique I've written above is probably a much simpler way for you to go about it. Since caching is good, you usually want to set that date far into the future (a "Far-future Expires Header"), and use the technique described above to force a change.

If you're interested in more info on HTTP or how these requests are made, a good book is "High Performance Web Sites" by Steve Souders. It's a very good introduction to the subject.

keparo
A: 

Changing the filename will work. But that's not usually the simplest solution.

An HTTP cache-control header of 'no-cache' doesn't always work, as you've noticed. The HTTP 1.1 spec allows wiggle-room for user-agents to decide whether or not to request a new copy. (It's non-intuitive if you just look at the names of the directives. Go read the actual HTTP 1.1 spec for cache... it makes a little more sense in context.)

In a nutshell, if you want iron-tight cache-control use

Cache-Control: no-cache, no-store, must-revalidate

in your response headers.

pcorcoran
Problem with this approach is that it generates a round trip to the server for all such content. This is not good.
AnthonyWJones
This solution isn't perfect but it works for all situations, including static web pages. And if you are only doing this for a limited number of files, say your CSS files, then it shouldn't add a significant amount of time to the page load.
Bill
+4  A: 

Dont use foo.css?version=1! Browsers aren't supposed to cache URLs with GET variables. According to http://www.thinkvitamin.com/features/webapps/serving-javascript-fast, though IE and Firefox ignore this, Opera and Safari don't! Instead, use foo.v1234.css, and use rewrite rules to strip out the version number.

airrob
First of all browsers don't cache, thats a function of HTTP. Why would http care about the structure of a URI? Is there an officail reference to a spec that states the HTTP cacheing should understand the semantics of a URI so that it won't cache items with a query string?
AnthonyWJones
A web browser that includes the functionality of caching objects (check your browser's cache directory). HTTP is a protocol including directives from servers to clients (proxies, browsers, spiders etc) suggesting cache control.
ΤΖΩΤΖΙΟΥ
+8  A: 

Instead of changing the version manually, I would recommend you use an MD5 hash of the actual CSS file.

So your URL would be something like

http://mysite.com/css/[md5_hash_here]/style.css

You could still use the rewrite rule to strip out the hash, but the advantage is that now you can set your cache policy to "cache forever", since if the URL is the same, that means that the file is unchanged.

You can then write a simple shell script that would compute the hash of the file and update your tag (you'd probably want to move it to a separate file for inclusion).

Simply run that script every time CSS changes and you're good. The browser will ONLY reload your files when they are altered. If you make an edit and then undo it, there's no pain in figuring out which version you need to return to in order for your visitors not to re-download.

levik
+1  A: 

I recently solved this using Python. Here the code (should be easy to adopt to other languages):

def import_tag(pattern, name, **kw):
    if name[0] == "/":
        name = name[1:]
    # Additional HTML attributes
    attrs = ' '.join(['%s="%s"' % item for item in kw.items()])
    try:
        # Get the files modification time
        mtime = os.stat(os.path.join('/documentroot', name)).st_mtime
        include = "%s?%d" % (name, mtime)
        # this is the same as sprintf(pattern, attrs, include) in other
        # languages
        return pattern % (attrs, include)
    except:
        # In case of error return the include without the added query
        # parameter.
        return pattern % (attrs, name)

def script(name, **kw):
    return import_tag("""<script type="text/javascript" """ +\
        """ %s src="/%s"></script>""", name, **kw)

def stylesheet(name, **kw):
    return import_tag('<link rel="stylesheet" type="text/css" ' +\
        """%s href="/%s">', name, **kw)

This code basically appends the files time-stamp as a query parameter to the URL. The call of the following function

script("/main.css")

will result in

<link rel="stylesheet" type="text/css"  href="/main.css?1221842734">

The advantage of course is that you do never have to change your html again, touching the CSS file will automatically trigger a cache invalidation. Works very good and the overhead is not noticeable.

pi
+1  A: 

Say you have a file available at:

/styles/screen.css

your can either append a query parameter with version information onto the URI, e.g.:

/styles/screen.css?v=1234

or you can prepend version information, e.g.:

/v/1234/styles/screen.css

IMHO the second method is better for CSS files because they can refer to images using relative URLs which means that if you specify a background-image like so:

body {
    background-image: url('images/happy.gif');
}

its URL will effectively be:

/v/1234/styles/images/happy.gif

This means that if you update the version number used the server will treat this as a new resource and not use a cached version. If you base your version number on the Subversion/CVS/etc. revision this means that changes to images referenced in CSS files will be noticed. That isn't guaranteed with the first scheme, i.e. the URL images/happy.gif relative to /styles/screen.css?v=1235 is /styles/images/happy.gif which doesn't contain any version information.

I have implemented a caching solution using this technique with Java servlets and simply handle requests to /v/* with a servlet that delegates to the underlying resource (i.e. /styles/screen.css). In development mode I set caching headers that tell the client to always check the freshness of the resource with the server (this typically results in a 304 if you delegate to Tomcat's DefaultServlet and the .css, .js, etc. file hasn't changed) while in deployment mode I set headers that say "cache forever".

wrumsby
A: 

I suggest implementing the following process:

  • version your css/js files whenever you deploy, something like: screen.1233.css (the number can be your SVN revision if you use a versioning system)

  • minify them to optimize loading times

Dan
A: 

if you are using jquery, there is an option called cache that will append a random number this is not a complete answer i know but it might save you some time

Miau
A: 

My method to do this is simply to have the link element into a server-side include:

<!--#include virtual="/includes/css-element.txt"-->

where the contents of css-element.txt is

<link rel="stylesheet" href="mycss.css"/>

so the day you want to link to my-new-css.css or whatever, you just change the include.

AmbroseChapel
A: 

I've created a website on my school's server which has an extremely aggressive proxy cache. I noticed that css I had fixed nearly 2 months ago was still being served via my school's proxy server.

I didn't want to disable the cache completely for these files, but I wanted to be able to still 'trick' the proxy server into reloading the files at least once in a while. The solution? Append the timestamp of the beginning of the current day to the end of the css file as a GET query (/style.css?39087309309) which tricked the proxy server, but allowed the end user to cache the files for at least a day.

Once we are finished with development and testing, I'll finalize the file names, or clear our school's proxy server.

David Wees
+2  A: 

Interesting post. Having read all the answers here combined with the fact that I have never had any problems with "bogus" query strings (which I am unsure why everyone is so reluctant to use this) I guess the solution (which removes the need for apache rewrite rules as in the accepted answer) is to compute a short HASH of the CSS file contents (instead of the file datetime) as a bogus querystring.

This would result in the following:

<link rel="stylesheet" href="/css/base.css?[hash-here]" type="text/css" />

Of course the datetime solutions also get the job done in the case of editing a CSS file but I think it is about the css file content and not about the file datetime, so why get these mixed up?

Michiel
+1  A: 

The RewriteRule needs a small update for js or css files that contain a dot notation versioning at the end. E.g. json-1.3.js.

I added a dot negation class [^.] to the regex so .number. is ignored.

RewriteRule ^(.*)\.[^.][\d]+\.(css|js)$ $1.$2 [L]
Nick Johnson
Thanks for the input! Since I wrote this post I've been burned by this too. My solution was to only rewrite if the last part of the filename contains exactly ten digits. (10 digits covers all timestamps from 9/9/2001 to 11/20/2286.) I've updated my answer to include this regex: `^(.*)\.[\d]{10}\.(css|js)$ $1.$2`
Kip