views:

178

answers:

1

I have a program to use the IMDB database and am having very slow performance on my query. It appears that it doesn't use my where condition until after it materializes everything. I looked around for hints to use but nothing seems to work. Here is my query:

SELECT * 
 FROM name as n1
  FULL JOIN aka_name
ON n1.id = aka_name.person_id
 FULL JOIN cast_info as t2
ON n1.id = t2.person_id
 FULL JOIN person_info as t3
ON n1.id = t3.person_id
 FULL JOIN char_name as t4
ON t2.person_role_id = t4.id
 FULL JOIN role_type as t5
ON t2.role_id = t5.id
 FULL JOIN title as t6
ON t2.movie_id = t6.id
 FULL JOIN aka_title as t7 
ON t6.id = t7.movie_id
 FULL JOIN complete_cast as t8
ON t6.id = t8.movie_id
 FULL JOIN kind_type as t9
ON t6.kind_id = t9.id
 FULL JOIN movie_companies as t10
ON t6.id = t10.movie_id
 FULL JOIN movie_info as t11
ON t6.id = t11.movie_id
 FULL JOIN movie_info_idx as t19
ON t6.id = t19.movie_id
 FULL JOIN movie_keyword as t12
ON t6.id = t12.movie_id
 FULL JOIN movie_link as t13
ON t6.id = t13.linked_movie_id
 FULL JOIN link_type as t14
ON t13.link_type_id = t14.id
 FULL JOIN keyword as t15
ON t12.keyword_id = t15.id
 FULL JOIN company_name as t16
ON t10.company_id = t16.id
 FULL JOIN company_type as t17
ON t10.company_type_id = t17.id
 FULL JOIN comp_cast_type as t18
ON t8.status_id = t18.id
 WHERE n1.id = 2003

Very table is related to each other on the join via foreign-key constraints and have indexes for all the mentioned columns.

The query plan details:

"Hash Left Join  (cost=5838187.01..13756845.07 rows=15579622 width=835) (actual time=146879.213..146891.861 rows=20 loops=1)"
"  Hash Cond: (t8.status_id = t18.id)"
"  ->  Hash Left Join  (cost=5838185.92..13542624.18 rows=15579622 width=822) (actual time=146879.199..146891.833 rows=20 loops=1)"
"        Hash Cond: (t10.company_type_id = t17.id)"
"        ->  Hash Left Join  (cost=5838184.83..13328403.29 rows=15579622 width=797) (actual time=146879.165..146891.781 rows=20 loops=1)"
"              Hash Cond: (t10.company_id = t16.id)"
"              ->  Hash Left Join  (cost=5828372.95..10061752.03 rows=15579622 width=755) (actual time=146426.483..146429.756 rows=20 loops=1)"
"                    Hash Cond: (t12.keyword_id = t15.id)"
"                    ->  Hash Left Join  (cost=5825164.23..6914088.45 rows=15579622 width=731) (actual time=146372.411..146372.529 rows=20 loops=1)"
"                          Hash Cond: (t13.link_type_id = t14.id)"
"                          ->  Merge Left Join  (cost=5825162.82..6699867.24 rows=15579622 width=715) (actual time=146372.366..146372.472 rows=20 loops=1)"
"                                Merge Cond: (t6.id = t13.linked_movie_id)"
"                                ->  Merge Left Join  (cost=5684009.29..6378956.77 rows=15579622 width=699) (actual time=144019.620..144019.711 rows=20 loops=1)"
"                                      Merge Cond: (t6.id = t12.movie_id)"
"                                      ->  Merge Left Join  (cost=5182403.90..5622400.75 rows=8502523 width=687) (actual time=136849.731..136849.809 rows=20 loops=1)"
"                                            Merge Cond: (t6.id = t19.movie_id)"
"                                            ->  Merge Left Join  (cost=4974472.00..5315778.48 rows=8502523 width=637) (actual time=134972.032..134972.099 rows=20 loops=1)"
"                                                  Merge Cond: (t6.id = t11.movie_id)"
"                                                  ->  Merge Left Join  (cost=1830064.81..2033131.89 rows=1341632 width=561) (actual time=63784.035..63784.062 rows=2 loops=1)"
"                                                        Merge Cond: (t6.id = t10.movie_id)"
"                                                        ->  Nested Loop Left Join  (cost=1417360.29..1594294.02 rows=1044480 width=521) (actual time=59279.246..59279.264 rows=1 loops=1)"
"                                                              Join Filter: (t6.kind_id = t9.id)"
"                                                              ->  Merge Left Join  (cost=1417359.22..1429787.34 rows=1044480 width=507) (actual time=59279.222..59279.224 rows=1 loops=1)"
"                                                                    Merge Cond: (t6.id = t8.movie_id)"
"                                                                    ->  Merge Left Join  (cost=1405731.84..1414378.65 rows=1044480 width=491) (actual time=59121.773..59121.775 rows=1 loops=1)"
"                                                                          Merge Cond: (t6.id = t7.movie_id)"
"                                                                          ->  Sort  (cost=1346206.04..1348817.24 rows=1044480 width=416) (actual time=58095.230..58095.231 rows=1 loops=1)"
"                                                                                Sort Key: t6.id"
"                                                                                Sort Method:  quicksort  Memory: 17kB"
"                                                                                ->  Hash Left Join  (cost=172406.29..456387.53 rows=1044480 width=416) (actual time=57969.371..58095.208 rows=1 loops=1)"
"                                                                                      Hash Cond: (t2.movie_id = t6.id)"
"                                                                                      ->  Hash Left Join  (cost=104700.38..256885.82 rows=1044480 width=358) (actual time=49981.493..50006.303 rows=1 loops=1)"
"                                                                                            Hash Cond: (t2.role_id = t5.id)"
"                                                                                            ->  Hash Left Join  (cost=104699.11..242522.95 rows=1044480 width=343) (actual time=49981.441..50006.250 rows=1 loops=1)"
"                                                                                                  Hash Cond: (t2.person_role_id = t4.id)"
"                                                                                                  ->  Hash Left Join  (cost=464.96..12283.95 rows=1044480 width=269) (actual time=0.071..0.087 rows=1 loops=1)"
"                                                                                                        Hash Cond: (n1.id = t3.person_id)"
"                                                                                                        ->  Nested Loop Left Join  (cost=0.00..49.39 rows=7680 width=160) (actual time=0.051..0.066 rows=1 loops=1)"
"                                                                                                              ->  Nested Loop Left Join  (cost=0.00..17.04 rows=3 width=119) (actual time=0.038..0.041 rows=1 loops=1)"
"                                                                                                                    ->  Index Scan using name_pkey on name n1  (cost=0.00..8.68 rows=1 width=39) (actual time=0.022..0.024 rows=1 loops=1)"
"                                                                                                                          Index Cond: (id = 2003)"
"                                                                                                                    ->  Index Scan using aka_name_idx_person on aka_name  (cost=0.00..8.34 rows=1 width=80) (actual time=0.010..0.010 rows=0 loops=1)"
"                                                                                                                          Index Cond: ((aka_name.person_id = 2003) AND (n1.id = aka_name.person_id))"
"                                                                                                              ->  Index Scan using cast_info_idx_pid on cast_info t2  (cost=0.00..10.77 rows=1 width=41) (actual time=0.011..0.020 rows=1 loops=1)"
"                                                                                                                    Index Cond: ((t2.person_id = 2003) AND (n1.id = t2.person_id))"
"                                                                                                        ->  Hash  (cost=463.26..463.26 rows=136 width=109) (actual time=0.010..0.010 rows=0 loops=1)"
"                                                                                                              ->  Index Scan using person_info_idx_pid on person_info t3  (cost=0.00..463.26 rows=136 width=109) (actual time=0.009..0.009 rows=0 loops=1)"
"                                                                                                                    Index Cond: (person_id = 2003)"
"                                                                                                  ->  Hash  (cost=42697.62..42697.62 rows=2442362 width=74) (actual time=49305.872..49305.872 rows=2442362 loops=1)"
"                                                                                                        ->  Seq Scan on char_name t4  (cost=0.00..42697.62 rows=2442362 width=74) (actual time=14.066..22775.087 rows=2442362 loops=1)"
"                                                                                            ->  Hash  (cost=1.12..1.12 rows=12 width=15) (actual time=0.024..0.024 rows=12 loops=1)"
"                                                                                                  ->  Seq Scan on role_type t5  (cost=0.00..1.12 rows=12 width=15) (actual time=0.012..0.014 rows=12 loops=1)"
"                                                                                      ->  Hash  (cost=31134.07..31134.07 rows=1573507 width=58) (actual time=7841.225..7841.225 rows=1573507 loops=1)"
"                                                                                            ->  Seq Scan on title t6  (cost=0.00..31134.07 rows=1573507 width=58) (actual time=21.507..2799.443 rows=1573507 loops=1)"
"                                                                          ->  Materialize  (cost=59525.80..63203.88 rows=294246 width=75) (actual time=812.376..984.958 rows=192075 loops=1)"
"                                                                                ->  Sort  (cost=59525.80..60261.42 rows=294246 width=75) (actual time=812.363..922.452 rows=192075 loops=1)"
"                                                                                      Sort Key: t7.movie_id"
"                                                                                      Sort Method:  external merge  Disk: 24880kB"
"                                                                                      ->  Seq Scan on aka_title t7  (cost=0.00..6646.46 rows=294246 width=75) (actual time=24.652..164.822 rows=294246 loops=1)"
"                                                                    ->  Materialize  (cost=11627.38..12884.43 rows=100564 width=16) (actual time=123.819..149.086 rows=41907 loops=1)"
"                                                                          ->  Sort  (cost=11627.38..11878.79 rows=100564 width=16) (actual time=123.807..138.530 rows=41907 loops=1)"
"                                                                                Sort Key: t8.movie_id"
"                                                                                Sort Method:  external merge  Disk: 3136kB"
"                                                                                ->  Seq Scan on complete_cast t8  (cost=0.00..1549.64 rows=100564 width=16) (actual time=0.013..10.744 rows=100564 loops=1)"
"                                                              ->  Materialize  (cost=1.08..1.15 rows=7 width=14) (actual time=0.016..0.029 rows=7 loops=1)"
"                                                                    ->  Seq Scan on kind_type t9  (cost=0.00..1.07 rows=7 width=14) (actual time=0.011..0.013 rows=7 loops=1)"
"                                                        ->  Materialize  (cost=412704.52..437969.09 rows=2021166 width=40) (actual time=3420.356..4278.545 rows=1028995 loops=1)"
"                                                              ->  Sort  (cost=412704.52..417757.43 rows=2021166 width=40) (actual time=3420.349..3953.483 rows=1028995 loops=1)"
"                                                                    Sort Key: t10.movie_id"
"                                                                    Sort Method:  external merge  Disk: 90960kB"
"                                                                    ->  Seq Scan on movie_companies t10  (cost=0.00..35214.66 rows=2021166 width=40) (actual time=13.271..566.893 rows=2021166 loops=1)"
"                                                  ->  Materialize  (cost=3144407.19..3269057.42 rows=9972019 width=76) (actual time=65485.672..70083.219 rows=5039009 loops=1)"
"                                                        ->  Sort  (cost=3144407.19..3169337.23 rows=9972019 width=76) (actual time=65485.667..68385.550 rows=5038999 loops=1)"
"                                                              Sort Key: t11.movie_id"
"                                                              Sort Method:  external merge  Disk: 735512kB"
"                                                              ->  Seq Scan on movie_info t11  (cost=0.00..212815.19 rows=9972019 width=76) (actual time=15.750..15715.608 rows=9972019 loops=1)"
"                                            ->  Materialize  (cost=207925.01..219867.92 rows=955433 width=50) (actual time=1483.989..1785.636 rows=429401 loops=1)"
"                                                  ->  Sort  (cost=207925.01..210313.59 rows=955433 width=50) (actual time=1483.983..1654.165 rows=429401 loops=1)"
"                                                        Sort Key: t19.movie_id"
"                                                        Sort Method:  external merge  Disk: 31720kB"
"                                                        ->  Seq Scan on movie_info_idx t19  (cost=0.00..15047.33 rows=955433 width=50) (actual time=7.284..221.597 rows=955433 loops=1)"
"                                      ->  Materialize  (cost=501605.39..537645.64 rows=2883220 width=12) (actual time=5823.040..6868.242 rows=1597396 loops=1)"
"                                            ->  Sort  (cost=501605.39..508813.44 rows=2883220 width=12) (actual time=5823.026..6477.517 rows=1597396 loops=1)"
"                                                  Sort Key: t12.movie_id"
"                                                  Sort Method:  external merge  Disk: 78888kB"
"                                                  ->  Seq Scan on movie_keyword t12  (cost=0.00..44417.20 rows=2883220 width=12) (actual time=11.672..839.498 rows=2883220 loops=1)"
"                                ->  Materialize  (cost=141143.93..152995.81 rows=948150 width=16) (actual time=1916.356..2253.004 rows=478358 loops=1)"
"                                      ->  Sort  (cost=141143.93..143514.31 rows=948150 width=16) (actual time=1916.344..2125.698 rows=478358 loops=1)"
"                                            Sort Key: t13.linked_movie_id"
"                                            Sort Method:  external merge  Disk: 29632kB"
"                                            ->  Seq Scan on movie_link t13  (cost=0.00..14607.50 rows=948150 width=16) (actual time=27.610..297.962 rows=948150 loops=1)"
"                          ->  Hash  (cost=1.18..1.18 rows=18 width=16) (actual time=0.020..0.020 rows=18 loops=1)"
"                                ->  Seq Scan on link_type t14  (cost=0.00..1.18 rows=18 width=16) (actual time=0.010..0.012 rows=18 loops=1)"
"                    ->  Hash  (cost=1537.10..1537.10 rows=91010 width=24) (actual time=54.055..54.055 rows=91010 loops=1)"
"                          ->  Seq Scan on keyword t15  (cost=0.00..1537.10 rows=91010 width=24) (actual time=0.006..14.703 rows=91010 loops=1)"
"              ->  Hash  (cost=4585.61..4585.61 rows=245461 width=42) (actual time=445.269..445.269 rows=245461 loops=1)"
"                    ->  Seq Scan on company_name t16  (cost=0.00..4585.61 rows=245461 width=42) (actual time=12.037..309.961 rows=245461 loops=1)"
"        ->  Hash  (cost=1.04..1.04 rows=4 width=25) (actual time=0.013..0.013 rows=4 loops=1)"
"              ->  Seq Scan on company_type t17  (cost=0.00..1.04 rows=4 width=25) (actual time=0.009..0.010 rows=4 loops=1)"
"  ->  Hash  (cost=1.04..1.04 rows=4 width=13) (actual time=0.006..0.006 rows=4 loops=1)"
"        ->  Seq Scan on comp_cast_type t18  (cost=0.00..1.04 rows=4 width=13) (actual time=0.002..0.003 rows=4 loops=1)"
"Total runtime: 147055.016 ms"

Is there anyway to force the name.id = 2003 before it tries to join all the tables together? As you can see, the end result is 4 tuples but it seems like it should be a fast join by using the available index after it limited it down with the name clause, although very complex.

A: 

It's going to be hard to optimize a query with that many FULL JOINs. Are you sure that's really what you mean - and not INNER JOIN, or at least LEFT JOIN? FULL JOINs cannot be reordered, and there are a lot of other optimizations that are also impossible, since they are designed to return things regardless of which table they're in. Not having looked at the schema specifically, but from the query it looks like it could be at least LEFT JOINs in most of those places. Do you really want rows from char_name even if there is no matching row in name etc?

Magnus Hagander
Thanks for the input. The LEFT JOIN did help but still took 80 seconds. Yes, I do want everything joined, even if no data exists, for a project I'm doing to build a text search implementation/proff of concept. I see why it takes forever now and decided to changed my program to generate a materialized view query based upon the left joins. This produced about 10,000 tuples per item I'm looking at (* 22 million) so I'm probably going to use a smaller DB to experiment with.