views:

138

answers:

3

I have a CMS system that stores data across tables like this:

Entries Table
+----+-------+------+--------+--------+
| id | title | text | index1 | index2 |
+----+-------+------+--------+--------+

Entries META Table
+----+----------+-------+-------+
| id | entry_id | value | param |
+----+----------+-------+-------+

Files Table
+----+----------+----------+
| id | entry_id | filename |
+----+----------+----------+

Entries-to-Tags Table
+----+----------+--------+
| id | entry_id | tag_id |
+----+----------+--------+

Tags Table
+----+-----+
| id | tag |
+----+-----+

I am in trying to implement a revision system, a bit like SO has. If I was just doing it for the Entries Table I was planning to just keep a copy of all changes to that table in a separate table. As I have to do it for at least 4 tables (the TAGS table doesn't need to have revisions) this doesn't seem at all like an elegant solution.

How would you guys do it?

Please notice that the Meta Tables are modeled in EAV (entity-attribute-value).

Thank you in advance.

+5  A: 

Have a look at this question: http://stackoverflow.com/questions/323065/how-to-version-control-a-record-in-a-database

Why not have a separate history_table for each table (as per the accepted answer on the linked question)? That simply has a compound primary key of the original tables' PK and the revision number. You will still need to store the data somewhere after all.

Edward Williams
@Edward Williams your answer is very interesting. Specially `Media Wiki` database scheme. It creates an extra layer `table -> revisions keys -> revisions data` keeping the `data table` as small as possible while providing a revisions history and being able to navigate between revisions back and forth, like SO does. I still wonder if there is a more *elegant* way to store it.
Frankie
+3  A: 

Hi am currently working on solution to similar problem, I am solving it by splitting my tables into two, a control table and a data table. The control table will contain a primary key and reference into the data table, the data table will contain auto increment revision key and the control table's primary key as a foreign key.

taking your entries table as an example

Entries Table
+----+-------+------+--------+--------+
| id | title | text | index1 | index2 |
+----+-------+------+--------+--------+

becomes

entries             entries_data
+----+----------+   +----------+----+--------+------+--------+--------+
| id | revision |   | revision | id |  title | text | index1 | index2 |
+----+----------+   +----------+----+--------+------+--------+--------+

to query

select * from entries join entries_data on entries.revision = entries_data.revision;

instead of updating the entries_data table you use an insert statement and then update the entries table's revision with the new revision of the entries table.

The advantage of this system is that you can move to different revisions simply by changing the revision property within the entries table. The disadvantage is you need to update your queries. I am currently integrating this into an ORM layer so the developers don't have worry about writing SQL anyway. Another idea I am toying with is for there to be a centralised revision table which all the data tables use. This would allow you to describe the state of the database with a single revision number, similar to how subversion revision numbers work.

ejrowley
Why did someone downvote ejrowley's answer? I'm still looking for the best possible solution and will give credit where credit is due. But this seams like a viable solution, no?
Frankie
@ejrowley even though your system is not exactly what I'm going for; I believe the `Media Wiki` table-scheme in Edward Williams to be a closer answer in terms of both performance and availability; I believe you're answer is in the line of Edwards answer with better schematics so I'm accepting it. Thanks!
Frankie
+1  A: 

For one of our projects we went the following way:

Entries Table
+----+-----------+---------+
| id | date_from | date_to |
+----+--------_--+---------+

EntryProperties Table
+----------+-----------+-------+------+--------+--------+
| entry_id | date_from | title | text | index1 | index2 |
+----------+-----------+-------+------+--------+--------+

Pretty much complicated, still allows to keep track of full object's lifecycle. So for querying active entities we were going for:

SELECT 
entry_id, title, text, index1, index2
FROM
Entities INNER JOIN EntityProperties
ON Entities.id = EntityProperties.entity_id
AND Entities.date_to IS NULL
AND EntityProperties.date_to IS NULL

The only concern was for a situation with entity being removed (so we put a date_to there) and then restored by admin. Using given scheme there's no way to track such kind of tricks.

Overall downside of any attempt like that is obvious - you've to write tons of TSQL where non-versioned DBs will go for something like select A join B.

Vitaly