views:

513

answers:

3

Sorry for a pretty specific question.

I have a table (see bottom), and when I try to delete a lot of records from it, my PostgreSQL 8.2.5 spends 98% of the time doing the parent-child constraint. I'm trying to figure out what index should I add to make it go fast. I have to say that all columns on this table have either 0 or null as the parent_block_id: it's rudimentary.

I've tried adding different indexes: just (parent_block_id); WHERE parent_block_id = 0; WHERE parent_block_id IS NULL; WHERE parent_block_id != 0. Neither of those resulted in a serious perfomance benefit.

varshavka=> explain analyze delete from infoblocks where template_id = 112;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Seq Scan on infoblocks  (cost=0.00..1234.29 rows=9 width=6) (actual time=13.271..40.888 rows=40000 loops=1)
   Filter: (template_id = 112)
 Trigger for constraint $1: time=4051.219 calls=40000
 Trigger for constraint $2: time=1616.194 calls=40000
 Trigger for constraint cs_ibrs: time=2810.144 calls=40000
 Trigger for constraint cs_ibct: time=4026.305 calls=40000
 Trigger for constraint cs_ibbs: time=3517.640 calls=40000
 Trigger for constraint cs_ibreq: time=774344.010 calls=40000
 Total runtime: 790760.168 ms
(9 rows)



varshavka=> \d infoblocks
                                      Table "public.infoblocks"
     Column      |            Type             |                      Modifiers
-----------------+-----------------------------+------------------------------------------------------
 id              | integer                     | not null default nextval(('IB_SEQ'::text)::regclass)
 parent_block_id | integer                     |
 nm_id           | integer                     | default 0
 template_id     | integer                     | not null
 author_id       | integer                     |
 birthdate       | timestamp without time zone | not null
Indexes:
    "infoblocks_pkey" PRIMARY KEY, btree (id)
    "zeroparent" btree (parent_block_id) WHERE parent_block_id <> 0
Foreign-key constraints:
    "$2" FOREIGN KEY (nm_id) REFERENCES newsmakers(nm_id) ON DELETE RESTRICT
    "$5" FOREIGN KEY (author_id) REFERENCES users(user_id) ON DELETE RESTRICT
    "cs_ibreq" FOREIGN KEY (parent_block_id) REFERENCES infoblocks(id) ON DELETE CASCADE
+2  A: 

Hi,

Do you have tried adding an index to template_id?

Regards.

ATorras
this is the real problem, its doing a sequential scan, and only one of them.
Kent Fredric
put a hash index on it if they're ident fields not intended to be sorted or used for anything other than lookup, btree if order is meaningful.
Kent Fredric
The problem is not the seq scan, the problem is that Trigger for constraint cs_ibreq: time=**774344.010** calls=40000
alamar
yes, but the constraint appears to be checked for every item in the sequential scan, not the subset of results that match the criteria.
Kent Fredric
also, newsmakers(nm_id) , users(user_id) and infoblocks(id) will all want indexes otherwise the dependant tables will *also* incur sequential scans, and that'll be death slow.
Kent Fredric
I agree with Kent.Could it be a BEFORE DELETE FOR EACH ROW? Docs says _for each row_ are processed for all affected rows (just one). More info: http://www.postgresql.org/docs/8.1/interactive/sql-createtrigger.html
ATorras
Maybe the ON DELETE CASCADE has some impact... What is the code of the constraint cs_ibreq?
ATorras
@Kent: *Why* would the constraint be checked for *every* item in the sequential scan? I can't believe that the PostgreSQL developers would really allow something that wasteful...
j_random_hacker
+2  A: 

If you can handle blocking everyone else out for a while, maybe drop the constraint cs_ibreq, do the delete, and then re-add the constraint?

Possibly because there is only one non-null value for parent_block_id, it's not using the index when checking the constraint? Although that seems a bit odd.

araqnid
+1, but note that this won't work in the general case when some records have a nonzero, non-null parent_block_id field.
j_random_hacker
+1 This tip is often used in Oracle DBs.
ATorras
With such index, the result is more or less the same.
alamar
+2  A: 

First of all: the first (zeroth!) thing you should do when noticing ugly query times is make sure that you have VACUUM ANALYZEd recently.

If you just need a one-off deletion, then see araqnid's answer. But if you need something that will continue to work in the future when some rows have a nonzero, non-null parent_block_id field, read on.

I'm guessing that PostgreSQL doesn't combine the deletions caused by ON DELETE CASCADE into a single query -- the fact that the EXPLAIN output shows these as triggers suggests that each child row deletion will in fact be performed separately. Presumably each row will be found using indexed lookup on parent_block_id, but that's still going to be much slower than a single sweep through the table.

So, you could probably get a big speedup by changing the ON DELETE CASCADE to ON DELETE RESTRICT, and manually compiling a list of all deletions that need to be performed in a temporary table, then deleting them all at once. This approach will be very fast if the maximum depth of your hierarchy is small. Here's some pseudocode:

# Insert the top-level rows as "seed" rows.
INSERT INTO rows_to_delete
    SELECT id, 0 FROM infoblocks WHERE template_id = 112

# Gather all rows that are children of any row at depth curLevel,
# advancing curLevel until no more children are found.
curLevel = 0
while (nRowsReturnedFromLastInsert > 0) {
    INSERT INTO rows_to_delete
        SELECT ib.id, rtd.level + 1
        FROM infoblocks ib
        JOIN rows_to_delete rtd ON (ib.parent_block_id = rtd.id)
        WHERE rtd.level = curLevel

    curLevel = curLevel + 1
}

DELETE FROM infoblocks
    JOIN rows_to_delete rtd ON (infoblocks.id = rtd.id)

(I'm not sure, but you may in fact need to use ON DELETE NO ACTION instead of ON DELETE RESTRICT for the final DELETE to succeed -- it's not clear to me whether a single DELETE statement is allowed to delete a parent and all its descendents when ON DELETE RESTRICT is in effect. If that's unacceptable for some reason, you could always loop through multiple DELETE statements, first deleting the bottommost level, then the next-bottommost and so on.)

j_random_hacker
Thanks for a hint, I'll try it tomorrow. As for VACUUM ANALYZE, I did that, it helped ~tenfold in absolute numbers but didn't change the underlying O(N^2) a bit.
alamar
You were absolutely right, ON DELETE RESTRICT magically destroyed teh problem!
alamar