views:

200

answers:

1

I am planning to implement a Type-2 SCD in PostgreSQL. The downside of SCDs is that you cannot have foreign keys referencing thos tables if used as is often seen. In other words, I often see referential integrity being dealt with within the application code. This strikes me as bad practice as it could be done directly within the database. Adding a handful of triggers could even hide this implementation detail from the app-coders.

I came up with the following schemas. Are these okay?

One-to-Many

--
-- One-to-Many
--
BEGIN;

   CREATE TABLE document(
      id serial not null,
      revision integer not null default 1,
      title varchar(30),
      primary key (id, revision)
   );

   CREATE TABLE page(
      id serial not null,
      title varchar(30),
      document_id integer not null,
      document_revision integer not null,
      foreign key (document_id, document_revision) references document(id, revision)
   );


   -- Insert the first revision
   INSERT INTO document (title) VALUES ('my first document');
   INSERT INTO page (title, document_id, document_revision) VALUES ('my first page', 1, 1);

   -- DEBUG: display
   SELECT * FROM document d inner join page p ON ( d.id = p.document_id and d.revision = p.document_revision );

   -- "update" the document, by inserting a new revision
   INSERT INTO document (id, revision, title) VALUES (1, 2, 'my first document, edited');

   -- update the references
   UPDATE page SET document_revision = 2 WHERE document_id = 1;

   -- DEBUG: display
   SELECT * FROM document d inner join page p ON ( d.id = p.document_id and d.revision = p.document_revision );

ROLLBACK;

Many-to-One

--
-- Many-to-One
--
BEGIN;

   CREATE TABLE page(
      id serial not null primary key,
      title varchar(30)
   );

   CREATE TABLE document(
      id serial not null,
      revision integer not null default 1,
      title varchar(30),
      page_id integer references page(id),
      primary key (id, revision)
   );

   -- Insert initial revision
   INSERT INTO page (title) VALUES ('my first page');
   INSERT INTO document (title, page_id) VALUES ('my first document', 1);
   INSERT INTO document (title, page_id) VALUES ('my second document', 1);

   -- DEBUG: display
   SELECT * FROM page p inner join document d on (p.id = d.page_id);

   -- destroy the link "from" the old revision
   UPDATE document SET page_id = NULL WHERE id=1;

   -- Add a new revision, referencing the page
   INSERT INTO document ( id, revision, title, page_id ) VALUES ( 1, 2, 'My First Document, edited', 1 );

   -- DEBUG: display
   SELECT * FROM page p inner join document d on (p.id = d.page_id);
   SELECT * FROM document;

ROLLBACK;

Many-to-Many

--
-- Many-to-Many
--
BEGIN;
   CREATE TABLE page(
      id serial not null primary key,
      title varchar(30)
   );

   CREATE TABLE document(
      id serial not null,
      revision integer not null default 1,
      title varchar(30),
      primary key (id, revision)
   );

   CREATE TABLE page_contains_document(
      page_id integer not null references page(id),
      document_id integer not null,
      document_revision integer not null,
      foreign key (document_id, document_revision) references document( id, revision )
   );

   -- Insert initial revision
   INSERT INTO page (title) VALUES ('My First page');
   INSERT INTO document (title) VALUES ('My Fist Document');
   INSERT INTO page_contains_document (page_id, document_id, document_revision) VALUES (1, 1, 1);

   -- DEBUG: display
   SELECT p.title, d.title, d.revision FROM page p INNER JOIN page_contains_document pcd ON (p.id = pcd.page_id) INNER JOIN document d ON (d.id = pcd.document_id and d.revision = pcd.document_revision);

   -- Add a new document revision
   INSERT INTO document (id, revision, title) VALUES (1, 2, 'My Fist Document, edited');

   -- update the reference
   UPDATE page_contains_document SET document_revision=2 WHERE document_id=1;

   -- DEBUG: display
   SELECT p.title, d.title, d.revision FROM page p INNER JOIN page_contains_document pcd ON (p.id = pcd.page_id) INNER JOIN document d ON (d.id = pcd.document_id and d.revision = pcd.document_revision);

ROLLBACK;
A: 

OK. You seem to have missed the point of type 2 SCDs entirely.

The should hold all the data in one table bracketed by dates (not revision numbers!).

so you could have:

   id   ,  name    ,  valid_from, valid_to
  1111  , MyBook   , 1st March 2009, 31st Dec 9999

After an update:
  1111  , Mybook   , 1st March 2009, 20th June 2009
  1111  , Mybook   , 21st June 2008, 31st Dec 9999

A similar structure with valid from and valid to dates should exist in the "pages" database.

The whold point is that now you can either get the latest version with:

select * from books where valid_to = '9999-12-31'

Or get the version that was valid on the first of April

select * from books where valid_to >= '2009-04-01' and valid_from <= '2009-04-01'

Also within you page structure you only need to store the updated pages. You dont need a new copy of all the pages for every revision.

James Anderson
Not entirely. I agree that the solution is not perfect as there is no simple way to determine the latest revision. But that's an impl. detail and is trivial to solve. The point of my q. was a different one so I added some more details to the first paragraph.
exhuma
The definition of SCD type 2 is to use effective from date and to date columns to build a complete history of the entity.Your scheme will work but its not SCD2.
James Anderson
The SCD2 definition states nothing about the "validity" columns being required. It states only that "for each change a new row is inserted". Granted, you need a discriminator to determine the "current" value. But how that is done is not dictated. In fact, using a "version" column is likely just as common as using valid-from/-to columns. Implementation differs, but in the end, it is still "Type-2". - As reference I can cite the Wikipedia article on SCDs and "Building and Managing the Meta Data Repository" by David Marco, ISBN 0-471-35523-2 - I don't have my DB book at hand so I can't cite it.
exhuma
I just discussed this with one of our other database experts. And he made a fair point. Giving the use of valid-from/-to columns a very solid "raison d'être". With those columns present, you can make a query to a specific point in time (f. ex.: "What did my data look like on 31st of December 1999"). With revision numbers only, you cannot do that. So then, the only thing left is decide is: "Do I need such queries". Or: "What information do I want to extract from my data?"
exhuma