tags:

views:

661

answers:

4

I've got a simple query (postgresql if that matters) that retrieves all items for some_user excluding the ones she has on her wishlist:

select i.* 
from core_item i 
left outer join core_item_in_basket b on (i.id=b.item_id and b.user_id=__some_user__)
where b.on_wishlist is null;

The above query runs in ~50000ms (yep, the number is correct). If I remove the "b.on_wishlist is null" condition or make it "b.on_wishlist is not null", the query runs in some 50ms (quite a change).

The query has more joins and conditions but this is irrelevant as only this one slows it down.

Some info on the database size:

  • core_items has ~ 10.000 records
  • core_user has ~5.000 records
  • core_item_in_basket has ~2.000
  • records (of which some 50% has on_wishlist = true, the rest is null)

I don't have any indexes (except for ids and foreign keys) on those two tables.

The question is: what should I do to make this run faster? I've got a few ideas myself to check out this evening, but I'd like you guys to help if possible, as well.

Thanks!

+5  A: 

try using not exists:

select i.* 
from   core_item i 
where  not exists (select * from core_item_in_basket b where i.id=b.item_id and b.user_id=__some_user__)
Mladen Prajdic
But that is neglecting the "b.on_wishlist is null" condition which seems to be the slowing factor here.
Christian Hang
ah then just add that to the where in the exists. also in your join try adding that is null into the join on condititon
Mladen Prajdic
No, this one doesn't help either. PostgreSQL learned to auto-optimize stuff like that in years.
michuk
+1  A: 

Have you tried adding an index on on_wishlist?

It seems that this column needs to be checked for every row in the query. If your tables are that big, this might have quite a significant impact on the query speed.

As you put the on_wishlist condition in the where clause, which will cause it (depending on the what the query planer decides) to be evaluated after the join has been performed, that comparison has to be done for potentially every row resulting from the join. Both the core_items and core_item_in_basket tables are pretty big, and you don't have an index for that column, so there is very little for the query optimizer to do, which probably leads to the excessive query time.

The size of core_user should have no influence (as it is not referenced in the query).

Christian Hang
No, indexes on wishlist make little sense as it can only be true or null
michuk
+1  A: 

You might want to explain more about the purpose of this query - as some techniques make and some don't make sense, depending on use case.

How often are you running it?

Is it run for only 1 user, or you run it for all users in some kind of loop?

Do: explain analyze and put the output on explain.depesz.com so you will see why it is so slow.

depesz
@depesz Sure, I will in some 2h when I get back home as I hate writing queries on my mobile over ssh.As a side note, I actually wanted to ask you directly about this one, but decided not to bother you with all the PSQL issues I have :)
michuk
So, answering your questions:a) The purpose of the query is to get X next movies to be rated but excluding those that are on someone's wishlist. b) It runs for one user every time the user wants to rate a random movie here http://filmaster.com/rate-movies/ - although we limit the results to 10, cache them and run the same query again after rating 10 next moviesc) The real query (the one I showed above is a simplified version) together with the explain analyze is here: http://explain.depesz.com/s/Cfd) And here is same query without the wishlist condition: http://explain.depesz.com/s/BPd
michuk
+1  A: 

Sorry for adding 2nd answer, but stackoverflow doesn't let me format comments properly, and since formatting is essential, I have to post answer.

Couple of options:

  1. CREATE INDEX q ON core_item_in_basket (user_id, item_id) WHERE on_wishlist is null;
  2. same index, but change order of columns in it.
  3. SELECT i.* FROM core_item i WHERE i.id not in (select item_id FROM core_item_in_basket WHERE on_wishlist is null AND user_id = __some_user__); (this query can benefit from index from point #1, but will not benefit from index #2.
  4. SELECT * from core_item where id in (select id from core_item EXCEPT select item_id FROM core_item_in_basket WHERE on_wishlist is null AND user_id = __some_user__);

Let us know the results :)

depesz
Answer number 4 is correct. It makes a lot of sense when you think about it. We don't want to have an outer join really (what for?), just exclude those few items that are on the wishlist.Indexes are no good here as the problem was not the core_item_in_basket table (which is tiny) but the unnecessary join.All checked in and working already in production, check out at http://filmaster.comAnd thanks a lot depesz for saving my butt again :P
michuk
BTW, the new explain analyze is here if you are interested: http://explain.depesz.com/s/sIG
michuk