ansaurus

Question

PosgreSQL Query Optimization and the Postmaster Process'

Answer 1

+1 A:

To your second question; you could try creating a new table with just the records you need with a CREATE TABLE AS statement; if the new table is sufficiently small, it might be faster- but it might not help either.

alex 2009-01-05 21:14:22

Actually, this is what I am trying for it seems like my best shot. I will post the results if it works. Thx!

Nicholas Leonard 2009-01-05 21:52:12

Answer 2

+1 A:

Hi again,

Indeed, I decided to CREATE a temporary table to speed up query execution:

CREATE TABLE temp_to_delete AS(
    (SELECT DISTINCT pl_from FROM pagelinks) 
     EXCEPT 
    (SELECT page_id FROM page));
DELETE FROM pagelinks USING temp_to_delete 
    WHERE pagelinks.pl_from IN (temp_to_delete.pl_from);

Surprisingly, this query completed in about 4 hours while the initial query had remained active for about 14hrs before I decided to kill it. More specifically, the DELETE returned:

Query returned successfully: 31340904 rows affected, 4415166 ms execution time.

As for the first part of my question, it seems that the postmaster process indeed keeps some info in cache; when another query requires info not in the cache and some memory (RAM), the cache is emptied. And the postmasters are indeed but a pool of process'.

It has also occurred to me that the gnome-system-monitor is a myth for it gives incomplete information and is worthless in informational value. It is mostly due to this application that I have been so confused lately; for example, it does not consider the memory usage of other users (like the postgres user!) and even tells me that I have 12 GB of RAM left when this is so untrue. Hence, I tried out a couple of system monitors for I like to know how postgreSQL is using its resources, and it seems that xosview is indeed a valid tool.

Hope this helps!

Nicholas Leonard 2009-01-06 00:48:45

Answer 3

A:

Your postmaster process will stay there as long as the connection to the client is open. Does pgadmin close the connection ? I don't know.

Memory used could be shared_buffers (check your config settings) or not.

Now, the query. For big maintenance operations like this, feel free to set work_mem to something large like a few GB. You look like you got lots of RAM, so use it.

set work_mem to '4GB'; EXPLAIN DELETE FROM pagelinks WHERE pl_from NOT IN (SELECT page_id FROM page);

It should seq scan page, hash it, and seq scan pagelinks, peeking in the hash to check for page_ids. It should be quite fast (much faster than 4 hours !) but you need a large work_mem for the hash.

But since you delete a significant portion of your table, it might be faster to do it like this :

CREATE TABLE pagelinks2 AS SELECT a.* FROM pagelinks a JOIN pages b ON a.pl_from = b.page_id;

(you could use a simple JOIN instead of IN)

You can also add an ORDER BY on this query, and your new table will be nicely ordered on disk for optimal access later.

peufeu 2009-10-27 23:25:30

ansaurus

tags:

views:

answers:

PosgreSQL Query Optimization and the Postmaster Process'

related questions