views:

91

answers:

4

I need to create an article system where no articles are ever deleted. Even more, they want all article edits, once the original is saved, to create new articles rather than overwrite the old ones.

So, person A writes article 1 and saves it. Person B edits article 1.

What is the best way to organize this database? My thoughts are:

  • articles
  • Articles
  • id
  • old_id (id of article before edit, so that user can 'undo')
  • deleted (boolean, if deleted it won't show up in system unless user views 'trash')
  • title
  • article
  • created (redundant?)
  • updated (date of last update)
  • user_id (id of user who last updated article)

so, when a user edits an article, it creates a new article record but saves the old article id as a reference.

is this the best way to go about it or is there a more efficient way?

+2  A: 

You can have the same article appear in the same table, but with a different ID value. You can determine the active article based upon the highest ID, or by setting a boolean value such as 'isActive.'

This is similar to how SO and Wikipedia handle revision-history.

PK | ID  |        Title         |        Text       |   Edited   | Edited By
-----------------------------------------------------------------------------
1  | 128 | History of Computin  | The History of... | 2009/07/10 | Jon Sampson
2  | 128 | History of Computing | The History of... | 2009/07/11 | John Smith

In this case, I can see that Article 128 was edited. The newest version is #2, by John Smith.

Jonathan Sampson
Is it worth storing the articles in a new table or should I leave them in the current table? I don't think the system will be under a lot of stress, but I would still like it to be efficient.
Brad
Keep it in the same table. Easy to rollback if you need to.
Jonathan Sampson
I would go the boolean isActive route. This way you could keep a later revision even if it is rejected and not selected as the "active" article. I think it could be more efficiently indexed and accessed than selecting the max of a particular article id. You could also start out by making your current table a view, filtering out on your bool isActive flag, and if performance becomes an issue you could go through the hassle of managing a true "current_table".
RC
+1  A: 

The way wikipedia does it is it has one table, IIRC called "cur" that contains only the current version of every article and a second table that contains all previous versions.

The advantage is that lookups on "cur" may be faster if the table is smaller and it doesn't have to check the key and some flag indicating whether it is the latest or not.

ʞɔıu
+1  A: 
TABLE Revisions:
 ID
 FK_Articles (Articles.ID)
 Status (active, deleted, etc)
 Content
 Metadata (timestamp, user data, etc)

TABLE Articles:
 ID
 FK_Revisions (Revisions.ID - current revision of the article)
sangretu
+1  A: 

Yeah, I'd go for that. A status column, with "A" (active), "D" (deleted), "P" (pending editor approval) and so on.

You might want to add a feature for when an article should go live, as many systems might need to wait until a certain time before showing the article, but the article is uploaded beforehand.

You probably want to keep a static article identifier, so that URLs with that identifier continue to work even if the article is updated. Probably worth have a table linking the static article identifier with the currently active article id (foreign key).

If you will have bazillions of articles, then it might be worthwhile having a system that moves old revisions of articles from the articles table to an archived articles table.

JeeBee