ansaurus

Question

Left Joins are what I want but they are very slow?

Answer 1

+8 A:

The query should be OK. I would run it through a query analyzer and refine the indexes on the tables.

Tor Haugen 2009-03-10 02:15:37

I guess this is what I get for trusting contributed modules to have correct index. Thanks!

jskulski 2009-05-18 23:30:52

Answer 2

+1 A:

If you define what you are looking for exactly rather than SELECT * it might speed it up a bit... also OR is not the fastest query to be doing, if you can re-write it without the OR it will be faster.

Also... you could try unions instead of left joins maybe?

SELECT s.user_id
   FROM subscribers s 
   LEFT JOIN bio b ON b.user_id = s.user_id 
   LEFT JOIN shirtsizes ON shirtsize.bio_id = bio.bio_id 
WHERE s.season_id = 185181 AND (bio.bio_id IS NULL OR shirtsize.size IS NULL);

would be something like:

(SELECT s.user_id FROM subscribers s WHERE s.season_id = 185181)
UNION
(SELECT b.user_id, b.bio_id FROM bio b WHERE bio.bio_id IS NULL)
UNION
(SELECT shirtsizes.bio_id FROM shirtsizes WHERE shirtsizes.size is NULL)

(to be honest that doesn't look right to me... but then I never use ~~joins or~~ join syntax or unions...)

I would do:

SELECT *
FROM subscribers s, bio b, shirtsizes sh
WHERE s.season_id = 185181
AND shirtsize.bio_id = bio.bio_id 
AND b.user_id = s.user_id 
AND (bio.bio_id IS NULL 
     OR 
     shirtsize.size IS NULL);

SeanJA 2009-03-10 02:23:33

"I never use joins or unions" Of course you do use joins. FROM subscribers s, bio b, shirtsizes shWHERE s.season_id = 185181AND shirtsize.bio_id = bio.bio_id AND b.user_id = s.user_id is a set of joins. It is is using the outdated style of join but it is still a join.

HLGEM 2009-03-10 13:28:48

I guess it would have been more correct to write 'I avoid using the join syntax and I never use unions'

SeanJA 2009-03-10 21:49:44

Answer 3

+7 A:

Joins are one of the most expensive operations that you can perform on an SQL query. While it should be able to automatically optimize your query somewhat, maybe try restructuring it. First of all, I would instead of SELECT *, be sure to specify which columns you need from which relations. This will speed things up quite a bit.

If you only need the user ID for example:

SELECT s.user_id
   FROM subscribers s 
   LEFT JOIN bio b ON b.user_id = subscribers.user_id 
   LEFT JOIN shirtsizes ON shirtsize.bio_id = bio.bio_id 
WHERE s.season_id = 185181 AND (bio.bio_id IS NULL OR shirtsize.size IS NULL);

That will allow the SQL database to restructure your query a little more efficiently on its own.

Brian 2009-03-10 02:34:42

Regular joins are moderately expensive; outer joins are more expensive; the typical correlated sub-query is diabolical.

Jonathan Leffler 2009-03-10 03:02:46

I posted the real query and am only grabbing a few columns, but thanks.

jskulski 2009-03-10 17:41:47

Answer 4

+1 A:

Is bio_id the primary key of bios? Is it really possible for there to be a bios row with b.user_id = subscribers.user_id but with b.bio_id NULL?

Are there shirtsize rows with shirtsize.bio_id NULL? Do those rows ever have shirtsize.size not NULL?

John Saunders 2009-03-10 02:37:43

Answer 5

+4 A:

Obviously I haven't checked this but it seems to be that what you want is to select any subscriber where there there isn't a matching bio or the join between bios and shirtsizes fails. I would consider using NOT EXISTS for this condition. You'll probably want indexes on bio.user_id and shirtsizes.bio_id.

select *
from subscribers
where s.season_id = 185181
      and not exists (select *
                      from bio join shirtsizes on bio.bio_id = shirtsizes.bio_id
                      where bio.user_id = subscribers.user_id)

EDIT:

Based on your update, you may want to create separate keys on each column instead of/in addition to having compound primary keys. It's possible that the joins aren't able to take optimal advantage of the compound primary indexes and an index on the join columns themselves may speed things up.

tvanfosson 2009-03-10 02:42:44

Quite a few people have investigated this and it turns out that SELECT * is normally slightly faster than SELECT 1 / SELECT NULL / etc in an EXISTS query. But yes, using EXISTS can often be faster than join, but not always, it's a weapon to try out in each scenario...

Dems 2009-03-10 10:22:39

Point taken. I've updated the answer.

tvanfosson 2009-03-10 11:41:06

Answer 6

+1 A:

Would it be any quicker to do a difference between the list of subscribers for the relevant season and the list of subscribers for the season with bios and shirt sizes?

SELECT *
   FROM Subscribers
   WHERE season_id = 185181
     AND user_id NOT IN
         (SELECT DISTINCT s.user_id
             FROM subscribers s
             JOIN bios b ON s.user_id = b.user_id
             JOIN shirtsizes z ON b.bio_id = z.bio_id
             WHERE s.season_id = 185181
         )

This avoids outer joins, which are not as fast as inner joins, and may therefore be quicker. On the other hand, it might be creating two large lists with very few differences between them. It is not clear whether the DISTINCT in the sub-query would improve or harm performance. It implies a sort operation (expensive) but paves the way for a merge-join if the MySQL optimizer supports such things.

There might be other notations available - MINUS or DIFFERENCE, for example.

Jonathan Leffler 2009-03-10 03:12:11

There is no reason to use DISTINCT in the subquery.

Bill Karwin 2009-03-11 03:35:48

Answer 7

A:

select * from subscribers where user_id not in (
  select user_id from bio where bio_id not in (
 select bio_id from shirt_sizes
  )
) and season_id=185181

Hafthor 2009-03-10 09:09:56

Answer 8

A:

I presume that your "big table" is subscribers, and that season_id is probably neither selective nor indexed (indexing it is rather meaningless if it's not selective, anyway), which means that you'll have to fully scan subscribers, anyway. Parting, I would join (with an inner join) the two other tables - note that if there is no bio_id in shirt_size it's exactly the same for your query as if there were no bio. First bit:

select uid
from bio
     inner join shirtsizes
             on shirtsizes.bio_id = bio.bio_id

At which point you want to check that shirtsizes is indexed on bio_id. Now you can left outer join this query to subscribers:

select *
from subscribers s
     left outer join (select uid
                      from bio
                      inner join shirtsizes
                              on shirtsizes.bio_id = bio.bio_id) x
                  on x.uid = s.uid
where s.season_id = 185181
  and x.uid is null

which is likely to run reasonably fast if neither bio nor shirtsizes are gigantic ...

2009-03-11 11:04:10

Answer 9

A:

Your query, as it is written now, evaluates all bio's and term_node's if they exist, and then filters them out.

But what you want is just find og_uid's that don't have term_node's (not having a bio also implies not having a term_node)

So you just want to stop evaluating bio's and term_node's as soon as you find the first existing term_node:

SELECT  *
FROM    (
        SELECT  ogu.nid, ogu.is_active, ogu.uid,
                (
                SELECT  1
                FROM    bio b, term_node tn
                WHERE   b.uid = ogu.uid
                        AND tn.nid = b.nid
                LIMIT   1
                ) AS ex
        FROM    og_uid ogu
        WHERE   ogu.nid = 185033
                AND ogu.is_admin = 0
        ) ogu1
WHERE   ex IS NULL

This will evaluate at most one bio and at most one term_node for each og_uid, instead of evaluating all existing thousands and the filtering them out.

Should work much faster.

Quassnoi 2009-03-11 12:45:33

ansaurus

tags:

views:

answers:

Left Joins are what I want but they are very slow?

related questions