I'm a developer for a marketing team and one of the features that often gets requested is: Can we go back to see what our site (or what X page) looked like back in X.
Are there any good solutions for solving for this request?
I'm a developer for a marketing team and one of the features that often gets requested is: Can we go back to see what our site (or what X page) looked like back in X.
Are there any good solutions for solving for this request?
have a look at the way back machine it's not perfect, but there are some embarrasing old sites still in there that I worked on :)
Have you looked at the wayback machine at archive.org?
http://www.archive.org/web/web.php
If that doesn't meet your needs, maybe you could automate something with your source control repository that could pull a version for a specific date.
Source Control should be able to solve your request in house. Label things appropriately and have an internal server to deploy that label to, and you should have no issue. If you have an automated deployment tool and choose your labels wisely, it should be relatively simple to write an app that will check out your source at label X and deploy it, by only having a user enter the label. Now if your labels we something like the date, they would just have to enter the date in the correct format and wait 5 minutes for the deploy.
Depending on your pages and exactly what you are asking for you might consider putting copies of the pages in source control.
This probably won't work if your content is in a database but if they are just HTML pages that you are changing over time then SCM would be the normal way to do this. The WayBackMachine that everyone mentions is great but this solution is more company specific allowing you to capture ever nuance of changes over time. You have no control over the WayBackMachine (to my knowledge).
In Subversion, you can set up hooks and automate this. In fact, this might even work if you are using content from a database...
Similar to what others have suggested, (assuming a dynamic website) I would use output caching to generate the web page's code, and then use Subversion to track the changes.
Using the WayBack machine is probably only a last resort, such as if an individual asks to see a webpage from before you set this system up. One cannot rely on the WayBack Machine to contain everything that one needs.
My suggestion would be to simply run wget over the site every night and store that on archive.yourdomain.com
. Add a control to each page for those with the appropriate permissions that passes the URL of the current page to a date picker. Once a date is chosen load archive.yourdomain.com/YYYYMMDD/original_url
.
Letting users browse the entire site without broken links on archive.yourdomain.com
might require some URL re-writing or copying the archived copy of the site from some respository to the root of archive.yourdomain.com
. To save disk space, that might be the best option. Store the wget
copies zipped, then extract the date the user requests. There are some issues with this, such as how do you deal with multiple users wanting to view multiple archived pages from different dates at the same time, etc.
I'd suggest that running wget
over your site each night is superior to retrieving it from source control since you would obtain the page as it was shown to WWW visitors, complete with any dynamically served content, errors, omissions, random rotated ads, etc.
EDIT: You could store the wget
output in source control, I'm not sure what that would buy you over zipping it up on a file system somewhere outside source control. Also note this plan would use up large amounts of disk space over time assuming a website of any size.