tags:

views:

98

answers:

1

Hi all,

I was just wondering for sites like stackoverflow and wikipedia, they stores history of edits indefinitely and allows user to roll back the edits. Can someone recommend any resources/books/articles regarding how to do this using any suitable technology (such as databases etc)

Thanks a lot!

+3  A: 

There are a number of options; the simplest, of course, being to simply record all versions independently. For a site like stackoverflow, where posts aren't usually edited very many times, this is appropriate. However for something like wikipedia, one needs to be more clever to save space.

In the case of wikipedia, pages are initially stored with each version seperate, in the text table. Periodically, a number of older revisions are compressed together, then packed into a single field. Since there will be a lot of repetition, you save a lot of space this way.

You might also want to look into how some version control systems do it - for example, subversion uses skip deltas, where revisions are stored as a difference from a revision halfway down the history. This means that one will have to examine at most O(lg n) revisions to reconstruct one's revision of interest.

Git, on the other hand, uses something more similar to wikipedia's approach. Revisions are stored as individually compressed 'loose' objects at first, then periodically git takes all of the loose objects, sorts them according to a somewhat complex heuristic, then builds compressed deltas between 'nearby' objects and dumps the result as a packfile. The number of revisions that need to be read to reconstruct a file is bounded by an argument to the pack building process. This has the interesting property that deltas can be built between objects that are unrelated, in some cases.

bdonlan