views:

466

answers:

4

Anyone know of an API (php preferable but I'd be interested in any language) for creating wiki-like data storage?

How about any resources on rolling your own plaintext wiki? How do other plaintext wikis handle the format of the text file?

I understand I can use Markdown or Textile for the formatting. But what I'm most interested in is how to approach the plaintext storage of multi-user edits.

I'm writing a web application that is primarily database driven. I want at least one text field of this database to be in a wiki-like format. Specifically, this text can be edited by multiple users with the ability to roll back to any version. Think the wiki/bio section of Last.FM (almost the entire site is strictly structured by a database except for this one section per artist).

So far, my approach of taking apart MediaWiki and wedging it into a database seems like overkill. I'm thinking it would be much easier to roll my own plaintext wiki, and store this file in the database's appropriate text field.

A: 

Here's a list of all 12 wikis on WikiMatrix that are written in PHP and do their storage using text files. Perhaps one of them will have a storage method you can adapt into the database:

http://www.wikimatrix.org/search.php?sid=1760

Chad Birch
A: 

It sounds like you are essentially just looking for version control. If that is the case, you may want to look into a diff algorithm.

Here is the Wikipedia Diff page.

I did a quick php diff google search, but nothing really stood out as a decent example, since I only have basic PHP knowledge.

Philip T.
+10  A: 

So, basically this is a "how do I version text information in my DB".

Well, the simplest way is simply copying the data.

Simply, create a "version" table that holds "old versions" of the data, and link it back to your main table.

create table docs {
    id integer primary key not null,
    version integer not null,
    create_date date,
    change_date date,
    create_user_id integer not null references users(id),
    change_user_id integer references users(id),
    text_data text
}

create table versions {
    id integer primary key not null,
    doc_id integer not null references docs(id),
    version integer,
    change_date date,
    change_user integer not null references users(id),
    text_data text
}

Whenever you update your original document, you copy the old text value in to this table, copy the user and change date and bump the version.

select version, change_date, change_user, text_data 
    into l_version, l_change_data, l_change_user, l_text_data 
from docs where id = l_doc_id;

insert into versions values (newid, l_doc_id, l_version, 
    l_change_date, l_change_user, l_text_data);

update docs set version = version + 1, change_date = now, 
    change_user = cur_user, text_data = l_new_text where id = l_doc_id;

You could even do this in a trigger if your DB supports those.

Faults with this method are that its a full copy of the data (so if you have a large document, the version stay large). You can mitigate that by using something like diff(1) and patch(1).

For example:

diff version2.txt version1.txt > difffile

Then you can store that difffile as "version 1".

In order to recover version 1 from version 2, you grab the version 2 data, run patch on it using the diff file data, and that gives you v1.

If you want to go from v3 to v1, you need to do this twice (once to get v2, and then again to get v1).

This lowers your storage burden, but increases your processing (obviously), so you'll have to judge how you want to do this.

Will Hartung
nice approach, I'll look into this!
AK
Wonderfully simple and efficient compared to mediawiki http://upload.wikimedia.org/wikipedia/commons/4/41/Mediawiki-database-schema.png
Cherian
btw y do u need change_date date and change_user_id integer references users(id) in the docs table? cant that be inferred from version table?
Cherian
Sure, the only issue there is that as presented, the docs table holds the current version of the text, while the versions table holds the older versions. So, for version 1, the docs table will have 1 row and the versions table will have 0 rows, so in that case you won't be able to capture the correct user. If you store all docs (including the current version) in the versions table, then you can remove create_date, create_user, change_date, and change_user from the docs table. Or you can simply make the docs table a link to the latest version in the versions table. All sorts of options here.
Will Hartung
+1  A: 

Will's huge answer is right on, but can be summed up, I think: you need to store the versions, and then you need to store the metadata (who what when of the data).

But your question was about resources on Wiki-like versioning. I have none (well, one: Will's answer above). However, about the storage of Wikis, I have one. Check out the comparison matrix from DokuWiki. I know. You're thinking "what do I care what brand of DB different Wikis use?" Because DokuWiki uses plain text files. You can open them and they are indeed plain. So that's one approach, and they've got some interesting arguments as to why DBMS are not the best way to go. They don't even hold much metadata: most of the stuff is done through the flat files themselves.

The point of the DokuWiki for you is that maybe it's a relatively simple problem (depending on how well you want to solve it :)

Yar