views:

225

answers:

7

Hi everyone,

I have made few projects (CMS and EC system) that required to have some data versioned.

Usually I come with that kind of schema

+--------------+
+ foobar       +
+--------------+
+ foobar_id    +
+ version      +
+--------------+

it worked great but I am wondering if there is a better to way do it. The main problem with that solution you have to always use subquery to get the latest version.

i.e.:

SELECT * FROM foobar WHERE foobar_id = 2 and version = (SELECT MAX(version) FROM foobar f2 WHERE f2 = 2)

This render most of the queries more complicate and also have some performance drawbacks.

So it would be nice if you share your experience creating versioned table and what kind of pro and cons of each method.

Thanks

+5  A: 

I prefer to have historical data in another table. I would make foobar_history or something similar and make a FK to foobar_id. This will stop you from having to use a subquery all together. This has the added advantage of not polluting your primary data table with the tons of historical data you probably don't want to see 99% of the time you're accessing it.

You will likely want to make a trigger for updating this data though, as it would require you to copy the current data in to _history and then do the update.

jcm
@jcm: thanks JCM yeah since approach seems a lot better to me actually . I would have to rework lot of code but that's no prob. thanks.
RageZ
We do this and use a Trigger to store the CURRENT record into the History table on UPDATE and DELETE - with additional column in History table for whether Update or Delete and the Date/Time of the action. Note that we do NOT store the current record in the History table (i.e. no trigger on INSERT) which I do see many auditing systems do, but strikes me that's using a lot fo disk space for data you already have in the main table. Having a PK on the history table of Main_ID + AuditDate has not been unique for us, so you may want to consider an IDENTITY column also
Kristen
My only doubt though is that you have a VERSION column. If every edit makes a new version the a History table is fine, if the user / some process defines when a new version has been "achieved" then that needs some further logic. You could still ahve a version column, every previous change in History, and query the most-recent-history record with a given version number - including the main table in case the most recent copy of the current version is requested :)
Kristen
Yes I agree with Kristen. Current record does NOT go in history table. You have a nullable editor/updatetime in the data table that will be populated if a user changes the values (they could be populated on insert, depends how your app works). Then these will be pushed to the history table when it is changed again.
jcm
Also, re what Kristen said about the version column. I agree that it probably isn't a good idea. Simply order by update time and put the null value (the original, if applies) at the bottom. You can generate "version numbers" upon display if you like.
jcm
+1  A: 

You can simplify the query by using a view over your table which filters to the latest version. This only makes the queries look nicer you still have the performance overhead.

Carsten
+2  A: 

The cleanest solution in my opinion would be to have a History table for each table that requires versioned. In other words, have a foobar table, and then a foobar_History table, with a trigger on foobar that will write existing data to the History table with a timestamp and user that changed the data. Older data is easily queryably, sorted by timestamp descending, and you know that the data in the main table is always the latest version.

baldy
@baldy: yeah that make sense and actually make a lot of things easier like I can use `INSERT INTO SELECT` to copy the data, plus I don't have to have `max` everywhere. Ok let's do it this way, I gonna have to rework lot of code but I should have thought before ;-)
RageZ
A: 

If you had used Oracle you could use analytic functions

select * from ( SELECT a.* , row_number() over (partition by foobar_id order by version desc) rn FROM foobar a WHERE foobar_id = 2 ) where rn = 1

len
+2  A: 

I used to work on a system with historical data, and we had a boolean to indicate which one was the latest version of the data. Of course you need to maintain the consitency of the flag at the applicative level. Then you can create indexes that use the flag and if you provide it in the where clause it's fast.

Pro:

  • easy to understand
  • does not require major change to your (existing) database schema
  • no need to copy old data in another table, only flag is updated.

Cons:

  • flag need to be maintained at applicative level

Otherwise, you can rely on a separate history table, as suggested in several answers.

Pro:

  • clean sepration of history from actual data
  • possible to have a db-level cascade delete between actual data and its history, in case the entity is removed

Cons:

  • need 2 queries (or a union) if you want the complete history (that is, old data + current data)
  • the row that corresponds to the latest version of the data will be updated. I heard that update are slower than insert, depending on the "size" of the data that changed.

What is best will depend from your use case. I had to deal with a document management system where we wanted to be able to version document. But we also had feature like reverting to old version. It was easier to use a boolean to speed up just the operation that required the last one. If you have real historical data (which never change) probably a dedicated history table is better.

Does the concept of history fit in your domain model? If no, then you have a db schema that differs from your conceptual domain model. If at the domain level, the actual data and the old data need to be handled the same way, having two tables complicates the design. Just consider the case you need to return the complete history (old + new). The easiest solution would be to have one class for each table, but then you can't return a list as easily as if you have only one table. But if these are two distinct concepts, then it's fine to have history be first-class in your design.

I would also recommend this article by M. Fowler also interesting when it comes to dealing with temporal data: Patterns for things that change with time

ewernli
+1  A: 

Common technique is to add a column version_status for current/expired. Also a note, if you keep new and old records in the same table, you should have a business (natural) key for your entity, something like name + pin, because the primary key will change (increment) with each row.

TABLE foobar(foobar_id PK, business_key, version, version_status, .....)

SELECT * 
FROM foobar 
WHERE business_key = 'myFoobar3' AND version_status = 'current'

When deciding to keep the record history in the same table -- or move it to a separate one -- consider other tables which have the foobar_id as a foreign key. When issuing a new version, should existing foreign keys point to the new PK or to the old PK? If you want to keep history of relationships, you would probably want to keep everything in the same table. If only the new version is important, you may consider to move expired rows to another table -- though it is not necessary.

Damir Sudarevic
A: 

It depends on how many of your tables require versioning, and if you've got a transactional ore reporting system.

If just a few transactional tables - the way that you're doing it is fine as long as the performance issues aren't too significant. You can make the querying easier by adding a column for current_row and a trigger that updates the prior row to make it non-current.

But if you've got a lot of tables or the extra rows are slowing down some of your queries then I'd do as others suggest and use history tables as well as history triggers. Note that you can generate that code to make it easier to develop & maintain.

If you're in the reporting world then there's a lot other options I won't address here. You can find the options given in detail in data warehousing data modeling books.

KenFar