ansaurus

Question

Slow Postgres JOIN Query

Answer 1

A:

Put an index on sp_article_categories.category_id

Steven 2010-09-17 19:39:14

Already have that one. I forgot to include it in my post...

erikcw 2010-09-17 20:15:01

Answer 2

A:

From a pure SQL perspective, your join is more efficient if your base table has fewer rows in it, and the WHERE conditions are performed on that table before it joins to another.

So see if you can get Django to select from the categories first, then filter the category_id before joining to the article table.

Pseudo-code follows:

SELECT * FROM categories c
INNER JOIN articles a
    ON c.category_id = 1081
    AND c.category_id = a.category_id

And put an index on category_id like Steven suggests.

Randolph Potter 2010-09-17 19:43:02

Didn't seem to make a difference: SELECT * FROM sp_article_categories cINNER JOIN sp_article a ON c.category_id = 1081 AND c.article_id = a.idWHERE a.post_count <= 50;

erikcw 2010-09-17 20:28:14

You may need to change the sort order of the article table, so that the category_id is included in the article_id btree index.

Randolph Potter 2010-09-17 21:37:05

Answer 3

A:

You can use field names instead * too.

select [fields] from....

pedrorezende 2010-09-17 19:44:13

I'm using field names in the actual code. Just used * to keep things short in the post. Doesn't seem to make a difference performance wise when I benchmark it.

erikcw 2010-09-17 20:15:41

Answer 4

A:

Hi Erik!

I assume you have run analyze on the database to get fresh statistics.

It seems that the join between sp_article.id and sp_article_categories.article_id is costly. What data type is the article id, numeric? If it isn't you should perhaps consider making it numeric - integer or bigint, whatever suites your needs. It can make a big difference in performance according to my experience. Hope it helps.

Cheers! // John

John P 2010-09-18 10:41:35

Answer 5

+1 A:

You've provided the vital information here - the explain analyse. That isn't showing a 1 second runtime though, it's showing 20 milliseconds. So - either that isn't the query being run, or the problem is elsewhere.

The only difference between explain analyse and a real application is that the results aren't actually returned. You would need a lot of data to slow things down to 1 second though.

The other suggestions are all off the mark since they're ignoring the fact that the query isn't slow. You have the relevant indexes (both sides of the join are using an index scan) and the planner is perfectly capable of filtering on the category table first (that's the whole point of having a half decent query planner).

So - you first need to figure out what exactly is slow...

Richard Huxton 2010-09-18 16:00:25

ansaurus

tags:

views:

answers:

Slow Postgres JOIN Query

related questions