views:

2001

answers:

6

I have a web page x.php (in a password protected area of my web site) which has a form and a button which uses the POST method to send the form data and opens x.php#abc. This works pretty well.

However, if the users decides to navigate back in Internet Explorer 7, all the fields in the original x.php get cleared and everything must be typed in again. I cannot save the posted information in a session and I am trying to understand how I can get IE7 to behave the way I want.

I've searched the web and found answers which suggest that the HTTP header should contain explicit caching information. Currently, I've tried this :

session_name("FOO");
session_start();
header("Pragma: public");
header("Expires: Fri, 7 Nov 2008 23:00:00 GMT");
header("Cache-Control: public, max-age=3600, must-revalidate");
header("Last-Modified: Thu, 30 Oct 2008 17:00:00 GMT");

and variations thereof. Without success. Looking at the returned headers with a tool such as WireShark shows me that Apache is indeed honouring my headers.

So my question is: what am I doing wrong?

+2  A: 

Firefox does this kind of caching. As I understand your question, you want IE7 to behave the way Firefox does. I think that's not possible.

Firefox an IE7 differ on the way they interpret the back button.

Firefox will display the DOM tree of the previous page as it was last displayed before the page was left. That is, all the form data will still be contained in the form's input field. But you won't see a onload event upon hitting the back button.

IE7 will again render the page based upon the response it had received from the server. Thus the form is emtpy (unless there were default values sent by the server originally), but you'll see an onload event.

mkoeller
A: 

I poked around and this is quite a hard problem. Its also a major pain in the ass for Dynamically modified content. You visit the page, javascript augments it with your instruction, you go to the next page, and come back, and javascript has forgotten. And there's no way to simply update the page server-side, because the page comes out of cache.

So I devised a back-button-cache-breaker.

Its evil and bad-for-the-web, but it makes it able for pages to behave how people expect them to behave instead of magically warping all over the place.

<script type="text/javascript">//<!-- <![CDATA[
(function(){
    if( document.location.hash === "" )
    {
        document.location.hash="_";
    }
    else
    {
      var l = document.location;
      var myurl = ( l.protocol + "//" + l.hostname + l.pathname + l.search); 
      document.location = myurl;
    }
})();
//]]> --></script>

This will do a bit of magic in that it detects whether or not the page you are /currently/ viewing was loaded from cache or not.

if you're there the first time, it will detect "no hash" , and add "#_" to the page url. if you're there for the >1st time ( ie: not a direct link to the page ), the page already has the #_ on it, so it removes it and in the process of removing it, triggers a page reload.

Kent Fredric
+3  A: 

IE will retain form contents on a back button click automatically, as long as:

  • you haven't broken cacheing with a no-cache pragma or similar
  • the form fields in question weren't dynamically created by script

You seem to have the cacheing in hand, so I'm guessing the latter may apply. (As mkoeller says, Firefox avoids this problem if the page is in the last few back clicks by keeping the page itself alive for longer than it's on the screen. However this is optional, and Firefox will revert to the same behaviour as IE and other browsers once you've browsed a few pages ahead and it has expired the old one.)

If you're creating your own form fields from script onload, then the browser has no way of knowing that the new input control is ‘the same’ as the old instance, so it can't fill it in with the previously-submitted value. In this case if you want it to play well with the back button you have to start storing data on the client.

Then you have to use some sort of state-keying so that each set of data is tied to exactly one instance of the page, otherwise going through multiple instances of the same form or having two browsers tabs open on the form at once will severely confuse your script.

And then you are starting to collect a lot of data if they're big forms, and if the client-side storage mechanism you're using is cookies you can start to lose data, as well as sending a load of unnecessary state nonsense with every HTTP request. Other client-side storage mechanisms are available but they're browser-specific.

In short: doing dynamically-generated forms nicely is a huge pain and it's probably best avoided if you can. Having a hidden form on the page that a script makes visible, thus allowing browsers to do their field-remembering-magic instead of giving you the task, is typically much easier.

bobince
Nice answer, but in my case, the problem must be somewhere else: my form is produced server-side and there is no JavaScript at all on the page, no `onload` handling, etc. So I really don't understand what's going on!
Pierre
Hmm... well, me neither then - do you have a publically-available URL we can look at?
bobince
Yes, http://www.epsitec.ch/xxx/buy/full-a -- wait a minute, if I add the trailing .htm to that URL, everything works fine! So my problem was caused by a rewrite rule which somehow confuses IE7's caching!?
Pierre
The full-a version is returning three extra headers that .htm does not: 'Content-Location', 'Vary' and 'TCN'. I suspect 'Vary' may be breaking the cacheing as IE has had horrific problems with this header in the past. Is this actually mod_rewrite putting it there? It looks more like mod_negotiate.
bobince
A: 

You could use autocomplete="off" in your fields. This way the values won't be cached by the browser so the values won't be filled in the form when the user clicks the back button.

Bogdan
Note of course, the "autocomplete='off'" doesn't validate as XHTML, but it can be programmatically appended as a property at runtime via JavaScript and it will still work.
Kent Fredric
Sorry, you did not read the question properly: my fields get cleared when I hit the back button. What I want is that IE should retain the data typed in by the user.
Pierre
A: 

While trying to further narrow down the problem, I've found the cause of my problem. I was using URLs which were being rewritten by Apache (i.e. I always accessed my page as http://foo.com/page which is mapped by Apache to http://foo.com/page.htm). Using the real URLs solved the problem and made IE7 happy, as long as I specify the proper HTTP header (Cache-Control, Expires, etc.).

Here is what I do in the PHP code to output headers which seem to make all browsers happy with the cache:

function emitConditionalGet($timestamp)
{
    // See also http://www.mnot.net/cache_docs/
    // and code sample http://simonwillison.net/2003/Apr/23/conditionalGet/

    $gmdate_exp    = gmdate('D, d M Y H:i:s', time() + 1) . ' GMT';
    $last_modified = gmdate('D, d M Y H:i:s', $timestamp) . ' GMT';
    $etag          = '"'.md5($last_modified).'"';

    // If the client provided any of the if-modified-since or if-none-match
    // infos, take them into account:

    $if_modified_since = isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])
                       ? stripslashes($_SERVER['HTTP_IF_MODIFIED_SINCE']) : false;
    $if_none_match     = isset($_SERVER['HTTP_IF_NONE_MATCH'])
                       ? stripslashes($_SERVER['HTTP_IF_NONE_MATCH'])     : false;

    if (!$if_modified_since && !$if_none_match)
    {
        return;  // the client does not cache anything
    }

    if ($if_none_match && $if_none_match != $etag)
    {
        return;  // ETag mismatch: the page changed!
    }
    if ($if_modified_since && $if_modified_since != $last_modified)
    {
        return;  // if-modified-since mismatch: the page changed!
    }

    // Nothing changed since last time client visited this page.

    header("HTTP/1.0 304 Not Modified");
    header("Last-Modified: $last_modified");
    header("ETag: $etag");
    header("Cache-Control: private, max-age=1, must-revalidate");
    header("Expires: $gmdate_exp");
    header("Pragma: private, cache");
    header("Content-Type: text/html; charset=utf-8");
    exit;
}

function emitDefaultHeaders($timestamp)
{
    $gmdate_exp    = gmdate('D, d M Y H:i:s', time() + 1) . ' GMT';
    $last_modified = gmdate('D, d M Y H:i:s', $timestamp) . ' GMT';
    $etag          = '"'.md5($last_modified).'"';

    header("Last-Modified: $last_modified");
    header("ETag: $etag");
    header("Cache-Control: private, max-age=1, must-revalidate");
    header("Expires: $gmdate_exp");
    header("Pragma: private, cache");
    header("Content-Type: text/html; charset=utf-8");
}

function getTimestamp()
{
    // Find out when this page's contents last changed; in a static system,
    // this would be the file time of the backing HTML/PHP page. Add your
    // own logic here:
    return filemtime($SCRIPT_FILENAME);
}

// ...

$timestamp = getTimestamp();
emitConditionalGet($timestamp);
emitDefaultHeaders($timestaml);
Pierre
A: 

What are the "correct" HTTP header settings by the way? I have the same problem, though my webpages have the PHP extension instead of HTML.

[edit] Oops, my response was meant for Pierre, below.

posfan12
Hi posfan12, I've added a piece of sample PHP to my answer. Hope this helps.
Pierre