views:

42

answers:

2

I have 2 tables (name(fields)):

data(object_id, property_id, value_id)

and

string(id, value)

All the data is in "string" table. "data" only refers to corresponding strings.

For example, I have:

data(1,2,3)
data(1,4,5)
data(6,4,7)

string(1, 'car')
string(2, 'color')
string(3, 'red')
string(4, 'make')
string(5, 'audi')
string(6, 'car2')
string(7, 'toyota')

Now what I want, is when I delete some rows in data table, then all orphan rows in string table would also be deleted:

if I delete data(6,4,7) then strings with id 6 and 7 would be deleted (because they are no longer used); 4 is used in another data row and therefore not deleted.

My question is, how to write an optimized delete query for string table?

Currently I have something like that (which works, but is very slow):

delete  
from string s  
where 1=1  
and (select count(id) from data where object_id = s.id) = 0  
and (select count(id) from data where property_id = s.id) = 0  
and (select count(id) from data where value_id = s.id) = 0

I have also tried (depending on the orphan count gives sometimes 10-20% faster result):

delete from string  
where (id not in (select usedids.id from (select object_id as id from data  
    union  
    select property_id as id from data  
    union  
    select value_id as id from data) as usedids)  
);

I have about 100k rows in both tables. If I delete about 6000 rows in data table, then cleaning string table takes about 3 minutes. I have an index on every field. I also have foreign key constraints.

A: 

the problem seems to be you are accessing your data table 3x instead of once.

your query should be

DELETE FROM string WHERE (SELECT count(*) FROM data WHERE object_id = s.id OR property_id = s.id OR value_id = s.id) = 0;

Haven't tried it yet but I can make modification if you let me know what happened.

mezzie
I have tried that. It still does basically the same thing (at least the speed is almost the same).explain (for select * from ...) gives me:Seq Scan on string s (cost=0.00..82770498.75 rows=353 width=119) Bitmap Heap Scan on data (cost=92.42..1166.00 rows=2376 width=0) BitmapOr (cost=92.42..92.42 rows=2376 width=0)
Ago
It seems that I cannot format my comment?
Ago
A: 

You want EXISTS.

delete  
from string s  
where 1=1  
and (select count(id) from data where object_id = s.id) = 0  

is actually correctly done as

delete from string s
where not exists ( select * from data d where d.object_id = s.id)

You don't actually want to count, but rather to just know if the sub table exists.


Aside from that, note that all of this would be handled for you if you were using foreign keys. That should be your next step after getting this code working.

Andy Lester
Can you be more precise how would I handle it with foreign keys? I understand, that I could use FK-s for opposite check: if I need to delete "data" row based on "string" deletion. I guess I could write a trigger, which for each "data" row deletion tries to delete orphan "string" rows. But I guess 10000 x executing trigger might have more penalty than running 1 delete query.
Ago
using "exists" gives me a better cost (when executing explain). 16k vs 234M (with count). When I tried deleting about 10k rows from "data" and then run delete on "string", the "exists"-query run about 80 seconds (compared to 180 seconds for "count"-query). BUT I noticed that when I commit my "data" deletion and run the second delete after that, my "count" query runs with 1.5 seconds, whereas "exists" query still runs with 80 seconds. I'd prefer deletions in one transaction. But 1.5 seconds sounds pretty good :) I might event go for it until there's no better solution.
Ago
Ago: Read up on what foreign key constraints do in your DBMS docs. Short version: If you have child tables with foreign key constraints to parent tables, then you can say, for example "if I delete the parent record, cascade delete the children, too". In your case, you'd want the constraint to say "If I try to delete the parent, but children exist, then don't delete the parent."
Andy Lester
Ago: You say "But I guess 10000 x executing trigger might have more penalty than running 1 delete query" but FKs take care of all of that for you. They are an extremely powerful part of using a relational database. Read up on them.
Andy Lester
From Postgresql manual: "A foreign key constraint specifies that the values in a column (or a group of columns) must match the values appearing in some row of another table." As I understand, the constraint only works one way (and not the way I'd like).
Ago