ansaurus

Question

How to optimize slow delete query (delete data which is not used in another table) in Postgresql

Answer 1

A:

the problem seems to be you are accessing your data table 3x instead of once.

your query should be

DELETE FROM string WHERE (SELECT count(*) FROM data WHERE object_id = s.id OR property_id = s.id OR value_id = s.id) = 0;

Haven't tried it yet but I can make modification if you let me know what happened.

mezzie 2010-10-20 18:03:16

I have tried that. It still does basically the same thing (at least the speed is almost the same).explain (for select * from ...) gives me:Seq Scan on string s (cost=0.00..82770498.75 rows=353 width=119) Bitmap Heap Scan on data (cost=92.42..1166.00 rows=2376 width=0) BitmapOr (cost=92.42..92.42 rows=2376 width=0)

Ago 2010-10-21 00:49:42

It seems that I cannot format my comment?

Ago 2010-10-21 00:51:53

Answer 2

A:

You want EXISTS.

delete  
from string s  
where 1=1  
and (select count(id) from data where object_id = s.id) = 0

is actually correctly done as

delete from string s
where not exists ( select * from data d where d.object_id = s.id)

You don't actually want to count, but rather to just know if the sub table exists.

Aside from that, note that all of this would be handled for you if you were using foreign keys. That should be your next step after getting this code working.

Andy Lester 2010-10-20 19:39:11

Can you be more precise how would I handle it with foreign keys? I understand, that I could use FK-s for opposite check: if I need to delete "data" row based on "string" deletion. I guess I could write a trigger, which for each "data" row deletion tries to delete orphan "string" rows. But I guess 10000 x executing trigger might have more penalty than running 1 delete query.

Ago 2010-10-21 01:16:16

using "exists" gives me a better cost (when executing explain). 16k vs 234M (with count). When I tried deleting about 10k rows from "data" and then run delete on "string", the "exists"-query run about 80 seconds (compared to 180 seconds for "count"-query). BUT I noticed that when I commit my "data" deletion and run the second delete after that, my "count" query runs with 1.5 seconds, whereas "exists" query still runs with 80 seconds. I'd prefer deletions in one transaction. But 1.5 seconds sounds pretty good :) I might event go for it until there's no better solution.

Ago 2010-10-21 01:22:52

Ago: Read up on what foreign key constraints do in your DBMS docs. Short version: If you have child tables with foreign key constraints to parent tables, then you can say, for example "if I delete the parent record, cascade delete the children, too". In your case, you'd want the constraint to say "If I try to delete the parent, but children exist, then don't delete the parent."

Andy Lester 2010-10-21 03:28:54

Ago: You say "But I guess 10000 x executing trigger might have more penalty than running 1 delete query" but FKs take care of all of that for you. They are an extremely powerful part of using a relational database. Read up on them.

Andy Lester 2010-10-21 03:29:47

From Postgresql manual: "A foreign key constraint specifies that the values in a column (or a group of columns) must match the values appearing in some row of another table." As I understand, the constraint only works one way (and not the way I'd like).

Ago 2010-10-21 11:13:56

ansaurus

tags:

views:

answers:

How to optimize slow delete query (delete data which is not used in another table) in Postgresql

related questions