views:

130

answers:

4

I am writing a Web application that has a user interface for editing documents. What is the best way to implement a history feature like Wikipedia's where edits to a document can be viewed?

+1  A: 

Use a version control system as your basis (save every version into a vcs), they store changes in deltas. You could then use their diff features to get the differences, but then you would have to parse the output. In git, for instance, you can get output from two different revisions by simply giving their hash as a parameter to git-diff.

That is, if you are not willing to use an existing system.

Makis
A simpler solution is to use CVS, which is rock solid, tiny and compact. You can also find it binaries for a lot of platform.Other options can be:+ SVN (subversion)+ Mercurial (for python guys)
daitangio
git is much faster, very tiny and compact. I don't know how cvs would make this any simpler.
Makis
+3  A: 

Well you will have to store the current document and archive changes to compare. Typically the main document is the one in the database then older versions on save are saved to another archive database or service.

Then you can pull the latest and the latest archived version and compare it with a diff algorithm.

Python has a diff algorithm tool difflib: http://docs.python.org/library/difflib.html Also a directory and file compare tool: http://docs.python.org/library/filecmp.html#module-filecmp

Many other languages also have diff algorithm implementations.

You can just store the deltas on change and recreate like a Berkley DB like Subversion but I recommend for simplicity just save a copy of the content then compare each of the latest, or the ones the user selects.

Ryan Christensen
Yes, this is simpler if you do thes storing yourself. The obvious problem, of course, is space requirements. If there aren't too many pages and too many revisions, this works fine.
Makis
True it would be project specific. Creating a delta diff like BerkeleyDB or using BDB itself is much more complex than a simple full content diff. But agreed, there is a balance there. Obviously for apps liek Subversion or git you want delta changes stored, for a simple todo task app full content as an example.
Ryan Christensen
A: 

Without knowing what framework and whatnot you are using this is a difficult question to answer well.

Are you using a database for your storage? Let's say you have a pages table in your database, why not create a pages_versions table for holding old revisions?

When saving anything to the pages table first insert a copy, into pages_versions. Retrieving the old versions will then be no more difficult than loading data through any other one to many relationship. You can beautify the data with a colourised diff or whatnot at this point.

I believe some frameworks now have support for using a version control system as a storage backend so that may also be worth investigating.

toholio
A: 

Are you referring to the back-end set up, or the front-end with the individual changes highlighted?

I can't help you with the front-end bit, but...

If it's the back-end, what you need is:

  1. a 'documents' table with say, id and title columns.
  2. a 'versions' table with columns for document_id (FK), body_text, edit_date, author, version
  3. In your application, a new document reference is first created in the documents table, then the data is stored as a new version in the versions table. When a user updates an old document, a new version is created with the same document reference in document_id.

(I think I've probably not explained this very well, so sorry about that!)

BTW, if you're using Rails there are several plug-ins which will do most of this for you. Acts_As_Versioned is the first which comes to mind.

Chris