views:

774

answers:

3

Using PHP how can I accurately test that a remote website supports the "If-Modified-Since" HTTP header.

From what I have read, if the remote file you GET has been modified since the date specified in the header request - it should return a 200 OK status. If it hasn't been modified, it should return a 304 Not Modified.

Therefore my question is, what if the server doesn't support "If-Modified-Since" but still returns a 200 OK?

There are a few tools out there that check if your website supports "If-Modified-Since" so I guess I'm asking how they work.

Edit:

I have performed some testing using Curl, sending the following;

curl_setopt($ch, CURLOPT_HTTPHEADER, array("If-Modified-Since: ".gmdate('D, d M Y H:i:s \G\M\T',time()+60*60*60*60)));
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_FORBID_REUSE, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 4);
curl_setopt($ch, CURLOPT_TIMEOUT, 4);

i.e. a date in the future google.com returns;

HTTP/1.0 304 Not Modified
Date: Fri, 05 Feb 2010 16:11:54 GMT
Server: gws
X-XSS-Protection: 0
X-Cache: MISS from .
Via: 1.0 .:80 (squid)
Connection: close

and if I send;

curl_setopt($ch, CURLOPT_HTTPHEADER, array("If-Modified-Since: ".gmdate('D, d M Y H:i:s \G\M\T',time()-60*60*60*60)));
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_FORBID_REUSE, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 4);
curl_setopt($ch, CURLOPT_TIMEOUT, 4);

i.e. a date in the past, google.com returns;

HTTP/1.0 200 OK
Date: Fri, 05 Feb 2010 16:09:12 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Server: gws
X-XSS-Protection: 0
X-Cache: MISS from .
Via: 1.0 .:80 (squid)
Connection: close

If I then send both to bbc.co.uk (which doesn't support it);

The future one returns;

HTTP/1.1 200 OK
Date: Fri, 05 Feb 2010 16:12:51 GMT
Server: Apache
Set-Cookie: BBC-UID=84bb66bc648318e367bdca3ad1d48cf627005b54f090f211a2182074b4ed92c40ForbSoft%20Web%20Diagnostics%20%28URL%20Validator%29; expires=Tue, 04-Feb-14 16:12:51 GMT; path=/; domain=bbc.co.uk;
Accept-Ranges: bytes
Cache-Control: max-age=0
Expires: Fri, 05 Feb 2010 16:12:51 GMT
Pragma: no-cache
Content-Length: 111677
Content-Type: text/html

The date in the past returns;

HTTP/1.1 200 OK
Date: Fri, 05 Feb 2010 16:14:01 GMT
Server: Apache
Set-Cookie: BBC-UID=841b66ec44232cd91e81e88a014a3c5e50ed4e20c0e07174c4ff59675cd2fa210ForbSoft%20Web%20Diagnostics%20%28URL%20Validator%29; expires=Tue, 04-Feb-14 16:14:01 GMT; path=/; domain=bbc.co.uk;
Accept-Ranges: bytes
Cache-Control: max-age=0
Expires: Fri, 05 Feb 2010 16:14:01 GMT
Pragma: no-cache
Content-Length: 111672
Content-Type: text/html

So my question still stands.

+2  A: 

If the entity returns a "Last-Modified" header, then it supports it. Makes sense really.

More info: http://httpd.apache.org/docs/2.2/caching.html (A Brief Guide to Conditional Requests)

Obviously only static pages/files will have that header. With dynamic content (asp, php, etc) there is no way to know by the headers (unless the site handlers caching manually, e.g. like this), and the entity may or may not support If-Modified-Since, from my experience.

Maybe you can just do two requests, one followed by another, sending a If-Modified-Since header, and then verify if the second request is a 304 or a 200.

EDIT- hurikhan77 points out a important note, and it's that, for example testing the root of the site for this capability, does not guarantee that the rest of the site does/doesn't support this too.

Infinity
Yes, it does make sense and thanks very much for sending me that link
Webbo
This is not exactly true: Replace "server" by "entity" and it will fit.
hurikhan77
+1  A: 

I have performed some testing on this and it appears to work as follows;

If you send an If-Modified-Since header with a date that is in the past (5 mins previous to the current time should do it) then sites such as google.com, w3.org, mattcutts.com will return a "HTTP/1.1 304 Not Modified" header. Sites such as yahoo.com, bbc.co.uk and stackoverflow.com always return a "HTTP/1.1 200 OK".

The "Last-Modified" header has nothing to do with "If-Modified-Since" because the whole point of sending back a "HTTP/1.1 304 Not Modified" header is that you don't have to send the body with it (thus saving bandwidth - which is the whole point behind this).

Therefore, the answer to my question is that if a site doesn't return a "HTTP/1.1 304 Not Modified" header when you send an "If-Modified-Since 5 mins ago" header, the site doesn't support the "If-Modified-Since" request properly.

If I am incorrect, please say so and provide testing to show.

Edit: I forgot to add that a good test is to make a normal HEAD request to the domain (e.g. w3.org), grab the "Last Modified" date and then make another request with "If-Modified-Since:". This will test that both the "Last Modified" value and "If-Modified-Since" request are supported. Please Note: just because the server sends back a "Last Modified" date doesn't mean it supports "If-Modified-Since"

Webbo
I'm glad you found a solution, but I did mention that in my answer as a more "practical" way to infer the capability of the server, versus the more "theoretical" header approach. Quoting myself: "Maybe you can just do two requests, one followed by another, sending a If-Modified-Since header, and then verify if the second request is a 304 or a 200."
Infinity
@Infinity - If you read my answer you'll see yours is barking up a different tree, but I can see what you mean by the "practical" approach, which is ultimately where I took it.
Webbo
+1  A: 

Hi, regarding the first answer above I'd like to note that conditional requests make as much sense on dynamic content as they do on static content. If the code that generates the dynamic content knows that the backend entity (e.g. database item) has not changed it should send a 304 upon a conditional request.

Jan

Jan Algermissen