views:

345

answers:

5

I want to build a revision control system for a wiki to learn Python.

I would like to use a method that only stores latest versions and the delta but am open to other ideas.

Do you know of any references/tutorials/books on how to build revision control?

I only beginning to learn this area so any help is appreciated. I am open to examples in languages besides python as well.

+1  A: 

You can look at the revision system in DokuWiki. It is in Php, but it can give you idea for the organization of the files and the expected features (comparison between old revisions, or compression when revisions are stored).

When you edit a page, DokuWiki creates a revision with the old document. The old versions can be viewed by clicking the Old Revisions button. On the page shown, revisions can be compared with the diff tool.

Revisions are stored within the attic dir, within the configured savedir.

The compression configuration option specifies if the pages will be saved as compressed files or not.

The default system path is <dokuwiki>/data/attic/<namespace>/<attic file>.

To remove any of the contents of the attic you can manually remove the corresponding files and sub-directories from the system.

VonC
A: 

My literate-Scheme wiki in Python has its own version-control system built in (see snapshot.py and diff.py). Minuses: the only documentation is the commented code, and I never got around to making a user interface to the VC functionality. Pluses: there's not much code to read, and it shouldn't be hard to follow. Snapshots include the entire wiki, not separate histories for individual pages.

For practicality, you're probably best off with an existing system like git, bzr, or hg.

Darius Bacon
+1  A: 

The popular Bazaar VCS is written entirely in Python. Like git however, it is a distributed VCS. Of course, it might be way overkill if you're just looking for a simple revision control system.

codelogic
+2  A: 

If and when I organize an intermediate programming course, one of the major projects will be a revision control system. I think it would be an excellent way to teach many important programming concepts and techniques that are applicable throughout students' careers.

  • Revision control system usage. After all, the best way to learn how to use a system is to program it.
  • Design and implementation of a command-line-based mini-language. Also, parsing complex command-line input.
  • Filesystem operations such as checking modification dates and stream-based editing
  • Algorithms such as longest common subsequence for implementing a Diff tool
  • Efficient database storage using file deltas
  • The difference between centralized (e.g. CVS, SVN) and distributed (e.g. Git, Mercurial) revision control systems
  • Network usage for operations on a remote repository

I commend you for choosing such an interesting project as a way to learn Python. I am sure that you will learn a lot. Since you are looking for resources, I suggest tackling each of the above topics individually. For example, there are many online tutorials covering network architectures.

Good luck!

Brett Daniel
+1  A: 

Here are a few resources I found helpful, when I was trying to do a similar project:

  • MediaWiki database schema - much more complex than you need, in all likelihood. The top left of the yellow box may be the most useful.
  • If you want to implement the diff algorithm, too, the Wikipedia article and the linked longest common subsequence article should offer a reasonable guideline.

Of course, there's a good chance you're done with the project by now.

Nikhil Chelliah
I wish I was done with the project! I haven't started yet!
Brandon