views:

33

answers:

1

Hi guys As you know checking process of web pages content is a little different from static pages or personal files on our machines because content of Dynamic web pages are changed on each request. So if we are going to use checksums to identifying changes, We'll fail! very simple example is when site owner are use Google Ads on him website; on each request Ads are different from previous. Also if we are going to cache only on period time, also We'll fail, because some web pages aren't updated every years but some are every minutes (if not seconds).

So what is better approach to solve this issue? (Thanks)

UPDATE

Another option is use of LastModified http-header! but this is strong approach?

+2  A: 

Browsers do this automatically with the help of the several caching mechanisms that HTTP provides. The two mechanisms most obviously useful for determining whether a page has changed is the concept of Entity Tags and the Last Modified HTTP header. These mechanisms allow the browser to make conditional requests to a web site, eg. fetch a page only if it has been changed.

Quoting RFC 2616 on HTTP 1.1:

3.11 Entity Tags

Entity tags are used for comparing two or more entities from the same requested resource. HTTP/1.1 uses entity tags in the ETag (section 14.19), If-Match (section 14.24), If-None-Match (section 14.26), and If-Range (section 14.27) header fields. The definition of how they are used and compared as cache validators is in section 13.3.3. An entity tag consists of an opaque quoted string, possibly prefixed by a weakness indicator.

The key point here is that the ETag is a cache validator. If a browser has a cached version of a page (called a resource in the RFC), it can use the ETag to determine whether the cached page is still valid, ie. if the page hasn't changed on the server.

And about the modification date:

14.25 If-Modified-Since

The If-Modified-Since request-header field is used with a method to make it conditional: if the requested variant has not been modified since the time specified in this field, an entity will not be returned from the server; instead, a 304 (not modified) response will be returned without any message-body.

The key point here is that the server may know when a page has been modified, and may then inform the client.

If you open a HTTP monitor (such as Fiddler for Windows) and watch your browser communicate with web sites, you'll see the use of these mechanisms first-hand when the browser makes conditional requests.

To specifically address your question about the Last Modified header, this header in itself won't work for the majority of pages you'll find. But in combination with the ETag it can get you started.

bzlm
Your descriptions are simple, but very useful! Thanks again ;)
Sadegh
@Sadegh Do let us know if you have more questions.
bzlm