I am writing a Web application that has a user interface for editing documents. What is the best way to implement a history feature like Wikipedia's where edits to a document can be viewed?
Use a version control system as your basis (save every version into a vcs), they store changes in deltas. You could then use their diff features to get the differences, but then you would have to parse the output. In git, for instance, you can get output from two different revisions by simply giving their hash as a parameter to git-diff.
That is, if you are not willing to use an existing system.
Well you will have to store the current document and archive changes to compare. Typically the main document is the one in the database then older versions on save are saved to another archive database or service.
Then you can pull the latest and the latest archived version and compare it with a diff algorithm.
Python has a diff algorithm tool difflib: http://docs.python.org/library/difflib.html Also a directory and file compare tool: http://docs.python.org/library/filecmp.html#module-filecmp
Many other languages also have diff algorithm implementations.
You can just store the deltas on change and recreate like a Berkley DB like Subversion but I recommend for simplicity just save a copy of the content then compare each of the latest, or the ones the user selects.
Without knowing what framework and whatnot you are using this is a difficult question to answer well.
Are you using a database for your storage? Let's say you have a pages
table in your database, why not create a pages_versions
table for holding old revisions?
When saving anything to the pages
table first insert a copy, into pages_versions
. Retrieving the old versions will then be no more difficult than loading data through any other one to many relationship. You can beautify the data with a colourised diff or whatnot at this point.
I believe some frameworks now have support for using a version control system as a storage backend so that may also be worth investigating.
Are you referring to the back-end set up, or the front-end with the individual changes highlighted?
I can't help you with the front-end bit, but...
If it's the back-end, what you need is:
- a 'documents' table with say, id and title columns.
- a 'versions' table with columns for document_id (FK), body_text, edit_date, author, version
- In your application, a new document reference is first created in the documents table, then the data is stored as a new version in the versions table. When a user updates an old document, a new version is created with the same document reference in document_id.
(I think I've probably not explained this very well, so sorry about that!)
BTW, if you're using Rails there are several plug-ins which will do most of this for you. Acts_As_Versioned is the first which comes to mind.