views:

730

answers:

8

On a wiki-style website, what can I do to prevent or mitigate write-write conflicts while still allowing the site to run quickly and keeping the site easy to use?

The problem I foresee is this:

  1. User A begins editing a file
  2. User B begins editing the file
  3. User A finishes editing the file
  4. User B finishes editing the file, accidentally overwriting all of User A's edits

Here were some approaches I came up with:

  • Have some sort of check-out / check-in / locking system (although I don't know how to prevent people from keeping a file checked out "too long", and I don't want users to be frustrated by not being allowed to make an edit)
  • Have some sort of diff system that shows an other changes made when a user commits their changes and allows some sort of merge (but I'm worried this will hard to create and would make the site "too hard" to use)
  • Notify users of concurrent edits while they are making their changes (some sort of AJAX?)

Any other ways to go at this? Any examples of sites that implement this well?

+16  A: 

Remember the version number (or ID) of the last change. Then read the entry before writing it and compare if this version is still the same.

In case of a conflict inform the user who was trying to write the entry which was changed in the meantime. Support him with a diff.

Most wikis do it this way. MediaWiki, Usemod, etc.

stesch
if the concern is performance it might be worth mentioning that the use of caching (ie using memcached) could avoid the need to hit the database to retrieve the last modified date of pages.
Robin
This will create a race condition if more than 1 save occurs "at once."
Andy
@Andy: Good point. Changed it to version number. Could be some ID, too.
stesch
+1  A: 

Using a locking mechanism will probably be the easiest to implement. Each article could have a lock field associated with it and a lock time. If the lock time exceeded some set value you'd consider the lock to be invalid and remove it when checking out the article for edit. You could also keep track of open locks and remove them on session close. You'd also need to implement some concurrency control in the database (autogenerated timestamps, perhaps) so that you could make sure that you are checking in an update to the version that you checked out, just in case two people were able to edit the article at the same time. Only the one with the correct version would be able successfully check in an edit.

You might also be able to find a difference engine that you could just use to construct differences, though displaying them in a wiki editor may be problematic -- actually displaying the differences is probably harder than constructing the diff. You'd rely on the versioning system to detect when you needed to reject an edit and perform a diff.

tvanfosson
WikiDot has a lock-time of 15 minutes idle time. As soon as you write something, the time is reset to 15 minutes.
Georg
+4  A: 

In Mediawiki, the server accepts the first change, and then when the second edit is saved a conflicts page comes up, and then the second person merges the two changes together. See Wikipedia: Help:Edit Conflicts

Adrian Archer
+5  A: 

Three-way merging: The first thing to point out is that most concurrent edits, particularly on longer documents, are to different sections of the text. As a result, by noting which revision Users A and B acquired, we can do a three-way merge, as detailed by Bill Ritcher of Guiffy Software. A three-way merge can identify where the edits have been made from the original, and unless they clash it can silently merge both edits into a new article. Ideally, at this point carry out the merge and show User B the new document so that she can choose to further revise it.

Collision resolution: This leaves you with the scenario when both editors have edited the same section. In this case, merge everything else and offer the text of the three versions to User B - that is, include the original - with either User A's version in the textbox or User B's. That choice depends on whether you think the default should be to accept the latest (the user just clicks Save to retain their version) or force the editor to edit twice to get their changes in (they have to re-apply their changes to editor A's version of the section).

Using three-way merging like this avoids lock-outs, which are very difficult to handle well on the web (how long do you let them have the lock?), and the aggravating 'you might want to look again' scenario, which only works well for forum-style responses. It also retains the post-respond style of the web.

If you want to Ajax it up a bit, dynamically 3-way merge User A's version into User B's version while they are editing it, and notify them. Now that would be impressive.

Phil H
A: 

Your problem (lost update) is solved best using Optimistic Concurrency Control.

One implementation is to add a version column in each editable entity of the system. On user edit you load the row and display the html form on the user. A hidden field gives the version, let's say 3. The update query needs to look something like:

update articles set ..., version=4 where id=14 and version=3;

If rows returned is 0 then someone has already updated article 14. All you need to do then is how to deal with the situation. Some common solutions:

  1. last commit wins
  2. first commit wins
  3. merge conflicting updates
  4. let the user decide

Instead of an incrementing version int/long you can use a timestamp but it's not suggested because:

retrieving the current time from the JVM isn't necessarily safe in a clustered environment, where nodes may not be time synchronized.

(quote from Java Persistence with Hibernate)

Some more info at the hibernate documentation.

cherouvim
+1  A: 

In Gmail, if we are writing a reply to a mail and someone else sends a reply while we are still typing it, a popup appears indicating that there is a new update and the update itself appears as another post without a page reload. This approach would suit your needs and if you can use Ajax to show the exact post with a link to diff of what was just updated while User B is still busy typing his entry that would be great.

Ravi Chhabra
Nice. I'd like to see this on Wikipedia.
bzlm
+1  A: 

As Ravi (and others) have said, you could use an AJAX approach and inform the user when another change is in progress. When an edit is submitted, just indicate the textual differences and let the second user work out how to merge the two versions.

However, I'd like to add on with something new you could try in addition to that: Open a chat dialog between the editors while they're doing their edits. You could use something like embedded Gabbly for that, for instance.
 
 
The best conflict resolution is direct dialog, I say.

Daddy Warbox
A: 

At my office, we have a policy that all data tables contain 4 fields:

  • CreatedBy
  • CreatedDate
  • LastUpdateBy
  • LastUpdateDate

That way there is a nice audit trail on who has done what to the records, at least most recently.

But most importantly, it becomes easy enough to compare the LastUpdateDate of the current or edited record on the screen (requires you to store it on the page, in a cookie, whatever, with the value in the database. If the values don't match, you can decide what to do from there.

Dillie-O