views:

87

answers:

5

I used to think that cache is browser driven and browser dont request the same file again if they think data is being repeated but reading some text on web, I realize that it is website that tell browser for which files should be not requested twice.

Can anyone clarify me on this?

+7  A: 

That's correct. It's controlled by the HTTP Cache-Control and Expires headers.

The first one basically tells the client the cache strategy. The second one basically tells the client the expiration time of the cache strategy (i.e. for how long to adhere the cache strategy before obtaining the new response and/or throwing the cached resposne away).

The webserver usually sends a default set of those headers. You can set/override those headers permanently in the server configuration or on a request basis in PHP using header() function. The following example instructs the client to never cache the response.

header('Cache-Control: no-cache, no-store, must-revalidate');
header('Pragma: no-cache');
header('Expires: 0');

The Pragma header is there to ensure compatibilty with old HTTP 1.0 clients which doesn't support Cache-Control yet (which was introduced in HTTP 1.1).

When the cache has been expired and the cached response contains a Last-Modified and/or ETag header as well, then the client can fire a conditional GET request with If-Modified-Since and/or If-None-Match. Whenever the If-Modified-Since and/or If-None-Match conditions are positive, then the server will send a 304 "Not Modified" response back without any content. If this happens, then the client is allowed to keep the currently cached content in the cache and update the headers.

BalusC
Can we specifically tell which resources should be cached and which should not?
Shubham
@Shubham: yes, that can be done on a file-extension basis in the webserver config and/or on a per-request basis in the server side programming language. The details depends on the webserver used. Usually scanning its documentation using the keyword "caching" is enough. Here's an Apache HTTPD targeted example: http://httpd.apache.org/docs/trunk/en/caching.html
BalusC
*Cache-Control* is not just for clients but also for the server and any web cache along the route between client and server. And clients don’t send a HEAD request but a GET request. Otherwise they would need to send another GET request if the cached representation is stale.
Gumbo
@Gumbo: you're right with regard to HEAD vs GET. The average webclient prefers a conditional GET above HEAD. Answer updated.
BalusC
A: 

Some ways to cache control sites....

Programmatic by setting HTTP headers (CGI scripts etc.)

Via tags ()

web server config files (httpd.conf, web.config)

This will vary depending web server type, eg. apache, ISA, etc.

Good Resource: http://en.wikipedia.org/wiki/Web_cache

Codex73
+2  A: 

The If-Modified-Since/If-Unmodified-Since HTTP request headers can be used to request a page if the condition given in them passes.

The ETag response header can be used by the browser to tell if the data in a page has changed. These tags can be retrieved via a HEAD request.

The Expires, Cache-Control, and Pragma response headers can be used by the server to let the browser know when they should attempt to fetch a new copy of the page instead of dipping into cache.

Ignacio Vazquez-Abrams
A: 

If you're interested in gaining greater control than HTTP permits, you could start using a manifest.cache. This is a file that the browser fetches with a list of resources in it - if a file has not been modified, the browser doesn't re-request it at all.

See: http://diveintohtml5.org/offline.html#manifest

While this is supported by all "modern" web browsers, if a browser doesn't support it it will just work as normal (i.e. rely purely on your HTTP headers, which others have described in their answers).

lucideer
A: 

This might help understanding how at least for internet explorer, cache is treated.

Caching Improvements in Internet Explorer 9

inklink28