views:

174

answers:

2

This is a problem I've come across for a couple different projects I'm working on.

Since the projects are still under stealth development, and the problem itself is of potentially broader interest and agnostic to them, I'm going to anonymize it - but FWIW, I work primarily with Ruby/Rails and MySQL.

Scenario

I want to have a system for document editing that's like a wiki, but has these features:

  • full VCS type behavior
    • multiple asynchronous users
    • fork & merge
    • blame & diff
    • versioning, commits, & rollbacks
    • offline editing on working copies / local repositories
    • efficient & robust storage
      • could be in MySQL or an independent repo, but no storing full flat text for each document
  • file-level fork, merge, etc ability
  • maximal transparency to users (they're very not hackers)

I don't need decentralized repositories, since all of my apps involve authoritative servers, but nor do I mind having 'em.

Unlike a good wiki, the documents that I'm dealing with are frequently redundant (with arbitrary minor changes) and have fairly complex edit and merge histories.

Existing solutions

The first requirement is very capably filled e.g. by GitWiki:

Unfortunately, Git (and all other VCSs I know of) only does repository-level forks; it doesn't allow for file-level forks.

What is a file/document level fork anyway?

Suppose that I write a bunch of recommendation letters for a lot of people and have them persistently available. (A bit of a contrived example, but bear with me.)

Realistically, what I have is three levels of changes:

  • a base recommendation letter template that applies to all of them
  • a more specific letter for a particular person
  • a fully specific letter for a particular person to a particular job

Now suppose that I want to change my contact info, or I have new information about a particular person. Conceptually, this should be as simple as:

  1. edit the base letter
  2. (auto) merge that into all letters that inherit from it

Transclusion

This could also be considered as using version control for a more flexible / robust sort of transclusion system - the difference being that unlike this simple example, parts of another document might be copied with arbitrary edits.

Another simplified example might help to explain this transclusion issue.

Suppose that our "documents" are posts on a forum. One common thing it to quote others' posts in full.

The quote itself is redundant, and in my use, it's useful for me to know who quoted whom/what, etc.

So in the simple case, what we can do is that when someone quotes another, in their editor appears something like: [quote msg=1234 v=1]What's the best recipe for mince pie?[/quote] My mom always used to...

In most cases, the user will just leave it alone and add their answers below. We can then do a simple transclusion, drop the redundant text, and replace it with a pointer to the quoted message - with the benefit of being able to display UI to indicate that the quoted text has been edited since it was quoted, and to easily see either version.

One step more complex, the user will edit it to be an excerpt, e.g. [quote msg=1234 v=1]mince pie?[/quote] Parsnip!. Again we can do just a pointer, this time adding an offset and length for transclusion, and UI to expand the excerpt.

However, our transclusion breaks if the user does something like: [quote msg=1234 v=1]What's the best recipe for stupid pie?[/quote] Fixed! lol. We can handle this by including the entire "fixed" text (in less stupid cases, this might be e.g. a helpful summarization or rephrasing) along with our pointer. UI can switch to the "uncorrected" version, auto highlight the diff,

Unfortunately, just storing the new text - rather than handling it as a real revision - means that we abandon the whole revision-control thing, and in larger documents (e.g. a program, book chapter, etc), this is insufficient.

More complexity

A more complex scenario could include, for instance:

  • "development branches" of some particular document (and thereby its dependents) being edited - and versioned - in parallel with the "production branch"
  • complex fork/merge histories (e.g. a child document is edited, expanded at length, and then partially re-merged into its parent and siblings to the extent that it's relevant to them)
  • non-hierarchical document relationships (e.g. you fork my recommendation letter template and make your own tweaks; I then merge back some of your tweaks into mine; in the meantime, someone else has forked mine again...)
  • etc.

I am primarily dealing with real text documents, but I think this can be considered to apply equally to code.

The questions:

  • Does anything already do this, or similar?
  • What would be a good way to design such a system?
  • What components would be good candidates to use? (like git and sinatra are used in gitwiki)
  • Is there something I'm overlooking?
+1  A: 

Github's gists are similar. It uses the ideas of git pretty directly. Certainly worth looking at.

teepark
AFAICT, gists are set up as essentially their own, individual one-file repositories.This is an interesting way to handle it, but seems like a hack. I wonder if there's some way to extend this technique to handle multi-document repositories?
Sai Emrys
+1  A: 

I think git is a good basis for this. The file level merging/forking operations could be implemented on top of it. You might want to take a look at my git-wiki fork. It could be extended for your purposes.

Daniel
I linked to that fork in the question. :-P
Sai Emrys