Hi there,
I have been running an UPDATE on a table containing 250 million rows with 3 index'; this UPDATE uses another table containing 30 million rows. It has been running for about 36 hours now. I am wondering if their is a way to find out how close it is to being done for if it plans to take a million days to do its thing, I will kill it; yet if it only needs another day or two, I will let it run. Here is the command-query:
UPDATE pagelinks SET pl_to = page_id
FROM page
WHERE
(pl_namespace, pl_title) = (page_namespace, page_title)
AND
page_is_redirect = 0
;
The EXPLAIN is not the issue here and I only mention the big table's having multiple indexes in order to somewhat justify how long it takes to UPDATE it. But here is the EXPLAIN anyway:
Merge Join (cost=127710692.21..135714045.43 rows=452882848 width=57)
Merge Cond: (("outer".page_namespace = "inner".pl_namespace) AND ("outer"."?column4?" = "inner"."?column5?"))
-> Sort (cost=3193335.39..3219544.38 rows=10483593 width=41)
Sort Key: page.page_namespace, (page.page_title)::text
-> Seq Scan on page (cost=0.00..439678.01 rows=10483593 width=41)
Filter: (page_is_redirect = 0::numeric)
-> Sort (cost=124517356.82..125285665.74 rows=307323566 width=46)
Sort Key: pagelinks.pl_namespace, (pagelinks.pl_title)::text"
-> Seq Scan on pagelinks (cost=0.00..6169460.66 rows=307323566 width=46)
Now I also sent a parallel query-command in order to DROP one of pagelinks' indexes; of course it is waiting for the UPDATE to finish (but I felt like trying it anyway!). Hence, I cannot SELECT anything from pagelinks for fear of corrupting the data (unless you think it would be safe to kill the DROP INDEX postmaster process?).
So I am wondering if their is a table that would keep track of the amount of dead tuples or something for It would be nice to know how fast or how far the UPDATE is in the completion of its task.
Thx (PostgreSQL is not as intelligent as I thought; it needs heuristics)