ansaurus

Question

PostgreSQL, triggers, and concurrency to enforce a temporal key

Answer 1

A:

Just a thought, in case the valid time blocks could be coded with a number or something, creating a UNIQUE index on Id+TimeBlock would be blazingly fast and resolve all table lock problems.

It is managed by PostgreSQL itself. On a select it acquires an ACCESS_SHARE lock which means that you can query the table but do not perform updates.

A radical solution which might help you is to use a cache like ehcache or memcached to store the id/timeblock info and not use the postgresql at all. Many can be persisted so they would survive a server restart and they do not exhibit this locking behavior.

Peter Tillemans 2010-06-09 07:36:52

I think this is actually what I ended up writing in my answer- creating a unique index on (id,applicable_date) where applicable_date is a TimeBlock in your terms; i.e. number of days since 1970-01-01. Although it's not enough to just create an index- the index needs to store each valid combination of (id,timeblock) which have to be generated somehow.

araqnid 2010-06-09 19:38:46

You do not need to generate the timeblocks if they can be calculated. Example : for a timeblock of an hour calculate the number of hours since 2000/01/01 midnight. The index will take care of the ret as the table is being filled. Postgres has very nice functions to slice and dice date/times anyway you want.

Peter Tillemans 2010-06-09 20:40:48

that's fine if a single row in the data table corresponds to a single timeblock, but in the OP's case they don't- he's inserting extents that cover multiple days, and simply adding one of the days/timeblocks covered isn't sufficient.

araqnid 2010-06-09 21:06:47

Answer 2

A:

Why can't you use a UNIQUE constraint? Will be much faster (it's an index) and easier.

Frank Heikens 2010-06-09 07:38:31

The valid time is stored using two DATE columns. And i can't use an UNIQUE constraint because i have to assure that periods don't overlaps.

Hobbes 2010-06-09 08:20:23

The upcoming release 9.0 has a solution for that, exclusion constraints: http://www.postgresql.org/docs/9.0/static/ddl-constraints.html#AEN2530

Frank Heikens 2010-06-09 08:23:31

I know, i've also read the presentation of that feature here: http://www.pgcon.org/2010/schedule/events/201.en.html. But in the meantime...

Hobbes 2010-06-09 08:57:41

Answer 3

+1 A:

One solution is to have a second table to use for detecting clashes, and populate that with a trigger. Using the schema you added into the question:

CREATE TABLE medicinal_product_date_map(
   aic_code char(9) NOT NULL,
   applicable_date date NOT NULL,
   UNIQUE(aic_code, applicable_date));

(note: this is the second attempt due to misreading your requirement the first time round. hope it's right this time).

Some functions to maintain this table:

CREATE FUNCTION add_medicinal_product_date_range(aic_code_in char(9), start_date date, end_date date)
RETURNS void STRICT VOLATILE LANGUAGE sql AS $$
  INSERT INTO medicinal_product_date_map
  SELECT $1, $2 + offset
  FROM generate_series(0, $3 - $2)
$$;
CREATE FUNCTION clr_medicinal_product_date_range(aic_code_in char(9), start_date date, end_date date)
RETURNS void STRICT VOLATILE LANGUAGE sql AS $$
  DELETE FROM medicinal_product_date_map
  WHERE aic_code = $1 AND applicable_date BETWEEN $2 AND $3
$$;

And populate the table first time with:

SELECT count(add_medicinal_product_date_range(aic_code, vs, ve))
FROM medicinal_products;

Now create triggers to populate the date map after changes to medicinal_products: after insert calls add_, after update calls clr_ (old values) and add_ (new values), after delete calls clr_.

CREATE FUNCTION sync_medicinal_product_date_map()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
  IF TG_OP = 'UPDATE' OR TG_OP = 'DELETE' THEN
    PERFORM clr_medicinal_product_date_range(OLD.aic_code, OLD.vs, OLD.ve);
  END IF;
  IF TG_OP = 'UPDATE' OR TG_OP = 'INSERT' THEN
    PERFORM add_medicinal_product_date_range(NEW.aic_code, NEW.vs, NEW.ve);
  END IF;
  RETURN NULL;
END;
$$;
CREATE TRIGGER sync_date_map
  AFTER INSERT OR UPDATE OR DELETE ON medicinal_products
  FOR EACH ROW EXECUTE PROCEDURE sync_medicinal_product_date_map();

The uniqueness constraint on medicinal_product_date_map will trap any products being added with the same code on the same day:

steve@steve@[local] =# INSERT INTO medicinal_products VALUES ('1','A','2010-01-01','2010-04-01');
INSERT 0 1
steve@steve@[local] =# INSERT INTO medicinal_products VALUES ('1','A','2010-03-01','2010-06-01');
ERROR:  duplicate key value violates unique constraint "medicinal_product_date_map_aic_code_applicable_date_key"
DETAIL:  Key (aic_code, applicable_date)=(1        , 2010-03-01) already exists.
CONTEXT:  SQL function "add_medicinal_product_date_range" statement 1
SQL statement "SELECT add_medicinal_product_date_range(NEW.aic_code, NEW.vs, NEW.ve)"
PL/pgSQL function "sync_medicinal_product_date_map" line 6 at PERFORM

This depends on the values being checked for having a discrete space- which is why I asked about dates vs timestamps. Although timestamps are, technically, discrete since Postgresql only stores microsecond-resolution, adding an entry to the map table for every microsecond the product is applicable for is not practical.

Having said that, you could probably also get away with something better than a full-table scan to check for overlapping timestamp intervals, with some trickery on looking for only the first interval not after or not before... however, for easy discrete spaces I prefer this approach which IME can also be handy for other things too (e.g. reports that need to quickly find which products are applicable on a certain day).

I also like this approach because it feels right to leverage the database's uniqueness-constraint mechanism this way. Also, I feel it will be more reliable in the context of concurrent updates to the master table: without locking the table against concurrent updates, it would be possible for a validation trigger to see no conflict and allow inserts in two concurrent sessions, that are then seen to conflict when both transaction's effects are visible.

araqnid 2010-06-09 19:20:38

Ok I got it. That's an interesting solution. However, i have to store about 30.000 products. Suppose they stay 15 years on the market on average. In that case the map table will be filled with 164 millions of rows, isn't it? And this doesn't affect performances?

Hobbes 2010-06-10 07:58:47

Well, a table with that number of rows will have *some* effect on your database of course. This extra table won't even be used except when the products table is being updated, though. (I didn't put code into the trigger to check for updates being to relevant columns, but it could do that too).

araqnid 2010-06-10 10:55:23

A very quick experiment on my dev machine even with a small installation (default 32mb buffercache for example) suggests that this sort of size is no problem- it takes a while to build the table initially, but there doesn't seem to be any noticeable penalty for individual product updates.

araqnid 2010-06-10 11:21:12

ansaurus

tags:

views:

answers:

PostgreSQL, triggers, and concurrency to enforce a temporal key

related questions