views:

270

answers:

3

The current content of this google docs page is:

alt text

However, when reading this page with the following PHP fopen() script, I get an older, cached version:

alt text

I've tried two solutions proposed in this question (a random attribute and using POST) and I also tried clearstatcache() but I always get the cached version of the web page.

What do I have to change in the following script so that fopen() returns the current version of the web page?

<?php
$url = 'http://docs.google.com/View?id=dc7gj86r_32g68627ff&amp;amp;rand=' . getRandomDigits(10);

echo $url . '<hr/>';
echo loadFile($url);

function loadFile($sFilename) {
    clearstatcache();
    if (floatval(phpversion()) >= 4.3) {
        $sData = file_get_contents($sFilename);
    } else {
        if (!file_exists($sFilename)) return -3;

        $opts = array('http' =>
          array(
            'method'  => 'POST',
            'content'=>''
          )
        );
        $context  = stream_context_create($opts);                

        $rHandle = fopen($sFilename, 'r', $context);
        if (!$rHandle) return -2;

        $sData = '';
        while(!feof($rHandle))
            $sData .= fread($rHandle, filesize($sFilename));
        fclose($rHandle);
    }
    return $sData;
}

function getRandomDigits($numberOfDigits) {
 $r = "";
 for($i=1; $i<=$numberOfDigits; $i++) {
  $nr=rand(0,9);
  $r .=  $nr;
 }
 return $r;
}

?>

ADDED: taking out the $opts and $context gives me a cached page as well:

function loadFile($sFilename) {
    if (floatval(phpversion()) >= 4.3) {
        $sData = file_get_contents($sFilename);
    } else {
        if (!file_exists($sFilename)) return -3;              

        $rHandle = fopen($sFilename, 'r');
        if (!$rHandle) return -2;

        $sData = '';
        while(!feof($rHandle))
            $sData .= fread($rHandle, filesize($sFilename));
        fclose($rHandle);
    }
    return $sData;
}

ADDED: this curl script which sends a Firefox user agent returns the cached version as well:

<?php
$url = 'http://docs.google.com/View?id=dc7gj86r_32g68627ff';
//$user_agent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)';
$user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)';
$ch = curl_init();
//curl_setopt($ch, CURLOPT_COOKIEJAR, "/tmp/cookie");
//curl_setopt($ch, CURLOPT_COOKIEFILE, "/tmp/cookie");
curl_setopt($ch, CURLOPT_URL, $url ); 
curl_setopt($ch, CURLOPT_FAILONERROR, 1); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); 
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_VERBOSE, 0);
echo curl_exec($ch);
?>
+1  A: 

I also get this:

Test One;http://docs.google.com/View?id=dc7gj86r_30dzgzbjch
Test Two;http://docs.google.com/View?id=dc7gj86r_31dbssfrzx

The "caching" must be being done at Google Docs or, more probably, it's your fault (wrong URL?).


Response headers:

Set-Cookie: ******
Content-Type: text/html; charset=UTF-8
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Date: Sun, 02 May 2010 03:30:29 GMT
X-Frame-Options: ALLOWALL
Content-Encoding: gzip
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Content-Length: 3987
Server: GSE
Alix Axel
yes, I bet you are using Internet Explorer when you get the cached version am I right? I get the CURRENT version in Firefox and Chrome but the CACHED version in IE8, as shown here: http://stackoverflow.com/questions/2742258/why-do-firefox-chrome-show-a-different-page-than-ie8, so surely, since Firefox and Chrome (webkit?) can indeed get the current version, there must be some option in fopen() to also force google docs to give a current version.
Edward Tanguay
@Edward: Nope, running Firefox 3.6.3 here and I've never visited that URL before with any browser whatsoever. Trust me: **it's your fault**.
Alix Axel
@Edward: Have you though of hitting **CTRL+F5 in Firefox and Chrome**?!
Alix Axel
My FF and chrome consistently give me the current version (I just updated it again), the SAME url in IE8 gives me the old version as do the above Curl script and fopen() script. Yes, I've tried CTRL+F5 in all browsers, but FF/Chromes gives new version, IE8 old version. What kind of headers could Google Docs be sending that makes the versions in various browsers so inconsistent?
Edward Tanguay
interesting: on a second machine both FF and IE8 show the old version, what is going on? why would the new version of the document be appearing and appearing consistently in only two browsers on one machine?
Edward Tanguay
@Edward: Just copied the response headers. IE 8.0 also gives me the exact same output. Firefox sometimes keeps the cache even if you hit CTRL+F5, try restarting the browser and if that doesn't work install the Web Developer Toolbar and disable cache under the "disable" menu, hit refresh again. If that also doesn't work, check the URL and the document revisions on Google Docs.
Alix Axel
@Edward: That's what I'm trying to say for the last half hour... The version you keep referring as old... isn't! It's the version cached by FF and Chrome thats old!
Alix Axel
but this is the opposite problem: only two browsers (ff/chrome) on one machine consistently give the new versions. All other browsers (IE) and FF/IE on my second computer show me the old version. So I don't need to press CTRL+F5 on my Firefox browser on my first machine since it is already showing me the most up-to-date version. It seems that google docs is giving the latest version only to one browser/cookie instance on this one machine and perhaps FF/Chrome share that cookie?
Edward Tanguay
yes but I just added a time stamp in the document and republished it, and my FF/Chrome on my first machine SHOW THAT CHANGE. Therefore, they are showing the most current version, and all other browsers (IE8-first machine, my second computer browsers, and your browswers) are showing a cached version from last week. Very odd.
Edward Tanguay
@Edward: That would be a interesting test, but @AlReece45 already nailed it with his last answer.
Alix Axel
YES!! ok you were right: IT'S MY FAULT! I simply forgot on this document to check the box "Automatically re-publish when changes are made" when I published the first time. Since FF and Chrome seem to share the cookie, they were showing me the updated version as the "published document" so I thought it was indeed published. Ok, lesson learned: always check the box. Thanks for your persistence.
Edward Tanguay
@Edward: No problem. =)
Alix Axel
+1  A: 

Try making sure your browser isn't caching the information. I'm not seeing any cache headers or anything. Your webserver might be adding something, or your browser might be assuming it's cached. Try including the time with the output so you can make sure the request was generated at the correct time.

I used fopen years ago for data that updated quite often. Never ran into a cache problem with fopen. In fact, I would be disappointed if the PHP developers added a web cache to fopen as it would ruin most of the valid use-cases AND it isn't in their documentation. I'll go and look at the PHP source code just to make sure.

Can you update the document so that some of us may try reproducing?

AlReece45
thanks, I updated the document again, but the fopen() script has nothing to do with my browser, I'm sure I could find some setting in Internet Explorer to clear its cache so that it displays the current version as my Firefox and Chrome do but what I need to get to work is the script, obviously Google Docs is doing some kind of selective caching but what I want to find out is why it gives my Firefox browser a new version but my (just added) curl script above which has a firefox user agent an old version, what other settings in curl/fopen can I change to force a new version?
Edward Tanguay
See my other answer, but basically this was ensuring the problem wasn't your computer caching the script you made (not the google docs page)
AlReece45
+2  A: 

I have successfully reproduced this. Google IS caching when you aren't the owner of the published web document. If you log out, it gave me the old version.

After I unpublished it and republished it, I could no longer reproduce the issue. Ensure that you keep publishing the document under the "Share as Web Page" when you update it.

Just to make sure, check in a browser that isn't logged in (or your script). If it doesn't update: unpublish and publish again. It did not change the URL for me.

AlReece45
+1, Nice. His Firefox and Chrome browsers have cookies set.. It had to be something related to that or to Google - or in this case both.
Alix Axel