views:

830

answers:

2

I am working with a postgreSQL database that gets updated in batches. I need to know when the last time that the database (or a table in the database)has been updated or modified, either will do.

I saw that someone on the postgeSQL forum had suggested that to use logging and query your logs for the time. This will not work for me as that I do not have control over the clients codebase.

+1  A: 

See the following article:

MySQL versus PostgreSQL: Adding a 'Last Modified Time' Column to a Table http://www.pointbeing.net/weblog/2008/03/mysql-versus-postgresql-adding-a-last-modified-column-to-a-table.html

Robert Harvey
+3  A: 

You can write a trigger to run every time an insert/update is made on a particular table. The common usage is to set a "created" or "last_updated" column of the row to the current time- but you could also update the time in a central location if you don't want to change the existing tables.

So for example a typical way is:

CREATE FUNCTION stamp_updated() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
  NEW.last_updated := now();
  RETURN NEW;
END
$$;
-- repeat for each table you need to track:
ALTER TABLE sometable ADD COLUMN last_updated TIMESTAMP;
CREATE TRIGGER sometable_stamp_updated
  BEFORE INSERT OR UPDATE ON sometable
  FOR EACH ROW EXECUTE stamp_updated();

Then to find the last update time, you need to select "MAX(last_updated)" from each table you are tracking and take the greatest of those, e.g.:

SELECT MAX(max_last_updated) FROM (
  SELECT MAX(last_updated) AS max_last_updated FROM sometable
  UNION ALL
  SELECT MAX(last_updated) FROM someothertable
) updates

For tables with a serial (or similarly-generated) primary key, you can try avoid the sequential scan to find the latest update time by using the primary key index--- or you create indices on last_updated.

-- get timestamp of row with highest id
SELECT last_updated FROM sometable ORDER BY sometable_id DESC LIMIT 1

Note that this can give slightly wrong results in case of IDs not being quite sequential, but how much accuracy do you need? (Bear in mind that transactions mean that rows can become visible to you in a different order to them being created)

An alternative approach to avoid adding 'updated' columns to each table is to have a central table to store update timestamps in. For example:

CREATE TABLE update_log(table_name text PRIMARY KEY, updated timestamp NOT NULL DEFAULT now());
CREATE FUNCTION stamp_update_log() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
  INSERT INTO update_log(table_name) VALUES(TG_TABLE_NAME);
  RETURN NEW;
END
$$;
-- Repeat for each table you need to track:
CREATE TRIGGER sometable_stamp_update_log
 AFTER INSERT OR UPDATE ON sometable
 FOR EACH STATEMENT EXECUTE stamp_update_log();

This will give you a table with a row for each table update: you can then just do:

SELECT MAX(updated) FROM update_log

To get the last update time. (You could split this out by table if you wanted). This table will of course just keep growing: either create an index on 'updated' (which should make getting the latest one pretty fast) or truncate it periodically if that fits with your use case, (e.g. take an exclusive lock on the table, get the latest update time, then truncate it if you need to periodically check if changes have been made).

An alternative approach- which might be what the folks on the forum meant- is to set 'log_statement = mod' in the database configuration (either globally for the cluster, or on the database or user you need to track) and then all statements that modify the database will be written to the server log. You'll then need to write something outside the database to scan the server log, filtering out tables you aren't interested in etc.

araqnid