Is there a way for me to force a specific join order in Postgres?
I've got a query that looks like this. I've eliminated a bunch of stuff that was in the real query, but this simplification demonstrates the issue. What's left shouldn't be too cryptic: Using a role/task security system, I'm trying to determine whether a given user has privileges to perform a given task.
select task.taskid
from userlogin
join userrole using (userloginid)
join roletask using (roleid)
join task using (taskid)
where loginname='foobar'
and taskfunction='plugh'
But I realized that the program already knows the value of userlogin, so it seemed the query could be made more efficient by skipping the lookup on userlogin and just filling in the userloginid, like this:
select task.taskid
from userrole
join roletask using (roleid)
join task using (taskid)
where userloginid=42
and taskfunction='plugh'
When I did that -- eliminating a table from the query and hard-coding the value retrieved from that table instead -- the explain plan time went up! In the original query, Postgres read userlogin then userrole then roletask then task. But in the new query, it decided to read roletask first, and then join to userrole, even though this required doing a full-file scan on roletask.
Full explain plans are:
Version 1:
Hash Join (cost=12.79..140.82 rows=1 width=8)
Hash Cond: (roletask.taskid = task.taskid)
-> Nested Loop (cost=4.51..129.73 rows=748 width=8)
-> Nested Loop (cost=4.51..101.09 rows=12 width=8)
-> Index Scan using idx_userlogin_loginname on userlogin (cost=0.00..8.27 rows=1 width=8)
Index Cond: ((loginname)::text = 'foobar'::text)
-> Bitmap Heap Scan on userrole (cost=4.51..92.41 rows=33 width=16)
Recheck Cond: (userrole.userloginid = userlogin.userloginid)
-> Bitmap Index Scan on idx_userrole_login (cost=0.00..4.50 rows=33 width=0)
Index Cond: (userrole.userloginid = userlogin.userloginid)
-> Index Scan using idx_roletask_role on roletask (cost=0.00..1.50 rows=71 width=16)
Index Cond: (roletask.roleid = userrole.roleid)
-> Hash (cost=8.27..8.27 rows=1 width=8)
-> Index Scan using idx_task_taskfunction on task (cost=0.00..8.27 rows=1 width=8)
Index Cond: ((taskfunction)::text = 'plugh'::text)
Version 2:
Hash Join (cost=96.58..192.82 rows=4 width=8)
Hash Cond: (roletask.roleid = userrole.roleid)
-> Hash Join (cost=8.28..104.10 rows=9 width=16)
Hash Cond: (roletask.taskid = task.taskid)
-> Seq Scan on roletask (cost=0.00..78.35 rows=4635 width=16)
-> Hash (cost=8.27..8.27 rows=1 width=8)
-> Index Scan using idx_task_taskfunction on task (cost=0.00..8.27 rows=1 width=8)
Index Cond: ((taskfunction)::text = 'plugh'::text)
-> Hash (cost=87.92..87.92 rows=31 width=8)
-> Bitmap Heap Scan on userrole (cost=4.49..87.92 rows=31 width=8)
Recheck Cond: (userloginid = 42)
-> Bitmap Index Scan on idx_userrole_login (cost=0.00..4.49 rows=31 width=0)
Index Cond: (userloginid = 42)
(Yes, I know that in both cases the costs are low and the difference doesn't look like it would matter. But this is after I eliminated a bunch of additional work from the query to simplify what I have to post. The real query still isn't outrageous, but I'm more interested in the principle.)