views:

32

answers:

2

Hi.

I have two databases for logging stuff, which I want to check for synchronization.

The solution approved is to send periodically (lets say hourly) a select to both, generate a hash of the result set and compare them. If they match then great, otherwise generate some alarms.

Currently I'm doing it by (bash script):

 log_table="SELECT column1, column2, column3 FROM log_table where to_char(timestamp, '$ts_format') = '$tx_moment'";
PSQL="psql -t -q -h $_gp_host -U $_gp_user -d log_schema -c ";
echo "`${PSQL} $tx_fix${log_table} | sort | cksum`";

I would like to do the cksum/hash on the postgres side. Because currently it downloads the whole result set (which can have even 25MB or more) and generates the checksum on the server side.

Google didn't help.

Any suggestions?

Thanks.

+1  A: 

You could use md5:

 log_table="
SELECT 
  md5(column1 || column2 || column3) AS hash,
  column1, column2, column3
FROM log_table where to_char(timestamp, '$ts_format') = '$tx_moment'";
Frank Heikens
Hmm, thanks. This will not resolve the problem I've described but at least it can reduce the size of downloaded result set.
KullDox
+1  A: 

If you want to do a hash over all of it at once, that's going to use up a lot of memory server-side as well. And once you hit 1Gb, it won't work anymore since a single string can't be longer than that.

Perhaps something like this will work, which basically does a hash of each row, and then hashes those hashes. It will still break whe nthe length of hashes go above 1Gb - you'll need to write a custom md5 aggregate to get around that.

SELECT md5(concat(md5(column1 || column2 || column3))) FROM log_table WHERE ...

This requires that you have created the custom aggregate concat like this:

CREATE AGGREGATE concat (
    BASETYPE = text,
    SFUNC = textcat,
    STYPE = text,
    INITCOND = ''
);
Magnus Hagander