tags:

views:

824

answers:

7

I have a database with a few dozen tables interlinked with foreign keys. Under normal circumstances, I want the default ON DELETE RESTRICT behavior for those constraints. But when trying to share a snapshot of the database with a consultant, I needed to remove some sensitive data. I wish that my memory of a DELETE FROM Table CASCADE command hadn't been pure hallucination.

What I ended out doing was dumping the database, writing a script to process the dump by adding ON DELETE CASCADE clauses too all the foreign key constraints, restoring from that, performing my deletes, dumping again, removing the ON DELETE CASCADEs, and finally restoring again. That was easier than writing the deletion query I'd have needed to do this in SQL -- removing whole slices of the database isn't a normal operation, so the schema isn't exactly adapted to it.

Does anyone have a better solution for the next time something like this comes up?

A: 

You may want to look into using schemas with PostgreSQL. I've done this in past projects to allow different groups of people or developers to have their own data. Then you can use your scripts to create multiple copies of your database for just such situations.

Tony Lenzi
A: 

@Tony: No, schemas can be useful, and indeed, we use them to partition data in our database. But I'm talking about trying to scrub sensitive data before letting a consultant have a copy of the db. I want that data gone.

A: 

I don't think you'd need to process the dump file like that. Do a streaming dump/restore, and process that. Something like:

createdb -h scratchserver scratchdb
createdb -h scratchserver sanitizeddb

pg_dump -h liveserver livedb --schema-only | psql -h scratchserver sanitizeddb
pg_dump -h scratchserver sanitizeddb | sed -e "s/RESTRICT/CASCADE/" | psql -h scratchserver scratchdb

pg_dump -h liveserver livedb --data-only | psql -h scratchserver scratchdb
psql -h scrachserver scratchdb -f delete-sensitive.sql

pg_dump -h scratchserver scratchdb --data-only | psql -h scratchserver sanitizeddb
pg_dump -Fc -Z9 -h scratchserver sanitizedb > sanitizeddb.pgdump

where you store all your DELETE sqls in delete-sensitive.sql. The sanitizeddb database/steps can be removed if you don't mind the consultant getting a db with CASCADE foreign keys instead of RESTRICT foreign keys.

There might also be better ways depending on how often you need to do this, how big the database is, and what percentage of data is sensitive, but I can't think of a simpler way to do it once or twice for a reasonably sized database. You'd need a different database after all, so unless you already have a slony cluster, can't avoid the dump/restore cycle, which might be time consuming.

angch
+1  A: 

You do not need to dump and restore. You should be able to just drop the constraint, rebuild it with cascade, do your deletes, drop it again, and the rebuild it with restrict.

CREATE TABLE "header"
(
  header_id serial NOT NULL,
  CONSTRAINT header_pkey PRIMARY KEY (header_id)
);

CREATE TABLE detail
(
  header_id integer,
  stuff text,
  CONSTRAINT detail_header_id_fkey FOREIGN KEY (header_id)
      REFERENCES "header" (header_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
);
insert into header values(1);
insert into detail values(1,'stuff');
delete from header where header_id=1;
alter table detail drop constraint detail_header_id_fkey;
alter table detail add constraint detail_header_id_fkey FOREIGN KEY (header_id)
      REFERENCES "header" (header_id) on delete cascade;
delete from header where header_id=1;
alter table detail add constraint detail_header_id_fkey FOREIGN KEY (header_id)
      REFERENCES "header" (header_id) on delete restrict;
Grant Johnson
+1  A: 

You could create the foreign key constraints as DEFERRABLE. Then you would be able to temporarily disable them while you scrub the data and re-enable them when you are done. Have a look at this question.

agnul
A: 

TRUNCATE table CASCADE;

I'm a Postgres novice, so I'm not sure what the trade-off is for TRUNCATE vs. DROP.

A: 

TRUNCATE just removes the data from table and leaves the structure

Tim Davis