views:

198

answers:

4

Hi

I'm looking to optimize my SQL.

My database schema is:

HOMES

  • home_id
  • address
  • city
  • state
  • zip
  • primary_photo_group_id

HOME_PHOTOS

  • photo_id (primary key)
  • home_id (home primary key)
  • photo_group_id (a photo group is the same image, resize from thumbnail to large size)
  • home_photo_type_id (the size of the image be it a thumbnail or a large size)
  • photo_url_dir (the filesystem location where the photo is stored)

Problem

It's very possible that a 'home' does not have a photo associated with the home. In that case, the primary_photo_group_id = 0. Otherwise, primary_photo_group_id` equals the group_id of the photo to use as the primary photo.

Slow SQL (b/c of UNION)

SELECT  homes.home_id,
    address,
    city,
    state,
    zip,
    photo_id,
    photo_url_dir
FROM homes, home_photos
WHERE homes.home_id = home_photos.home_id
AND primary_photo_group_id = home_photo_group_id
AND home_photo_type_id = 2

UNION

SELECT  homes.home_id,
    address,
    city,
    state,
    zip,
    null,
    null
FROM homes
WHERE primary_photo_group_id = 0

What I would like to do

I would like to get rid of the UNION since I'm having to search back over the entire table 2x. How can I get rid of the UNION since I need to check for the case where primary_photo_group_id = 0 and if it's not equal to 0, then query the home_photos table

Here is the pseudo code that needs to happen

SELECT  homes.home_id,
    address,
    city,
    state,
    zip,
    photo_id,  (include only if primary_photo_group_id != 0)
    photo_url_dir  (include only if primary_photo_group_id != 0)
FROM homes,
    home_photos (include only if primary_photo_group_id != 0)
WHERE 
    primary_photo_group_id = 0
ELSE
    homes.home_id = home_photos.home_id
        AND primary_photo_group_id = home_photo_group_id
        AND home_photo_type_id = 2
A: 

Read up about Joins

 Select * from table_a, table_b

is a cross join - which you limit into an inner join with the where clauses. Turn your queries into inner joins, then read up about outer joins.

Edit: I didn't just give you the answer, because I thought it was homework, but I still won't just give you the answer because outer joins are important enough to know about even if you're just writing SQL for a website. you'll be a better person for learning it, and using this as an example.

Tom Ritter
Thanks @Tom Ritter, it appears I need to brush up on my inner and outer join statements :)
+1  A: 
SELECT  homes.home_id,
        address,
        city,
        state,
        zip,
        photo_id,
        photo_url_dir
FROM    homes
LEFT JOIN
        home_photos
ON      home_photos.home_id = homes.home_id
        AND home_photo_group_id = CASE WHEN primary_photo_group_id = 0 THEN NULL ELSE primary_photo_group_id END
        AND home_photo_type_id = 2

Having a composite index on home_photos (home_id, home_photo_group_id, home_photo_type_id) will greatly improve this query.

Note that using CASE is slightly more efficient than left joining on 0, even if there are no records with home_photo_group_id = 0 in home_photos.

When MySQL sees a JOIN on NULL (which can yield nothing by definition), it won't even look into the joined table. When it joins on 0, it still has to check the index and make sure no value exists.

This is not very much of a performance impact, but still can improve your query time by several percents, especially if you have a lot of 0's in homes.

See this entry in my blog for performance detail:

Also note that your tables are not in 2NF.

Your group_id depends on home_id, and including it into home_photos is 2NF violation.

It's not always bad, but it may be harder to manage.

Quassnoi
Should I use this (which include a CASE statement) or @OrbMan, which does not?
@Jimmy8: this one is safer. But if you are absolutely sure you'll never have any records in home_photos with home_photo_group_id = 0, you may get rid of CASE just as well.
Quassnoi
@Jimmy8: and note that using NULL instead of 0 to mark the absence of value is cleaner.
Quassnoi
+1  A: 

If the second query is slower than the first, it's because you have an index on {home_id, primary_photo_group_id} (or perhaps just {home_id}), but not on {primary_photo_group_id} alone. You need an index on this column if you want to improve the performance of looking up on that column.

Welbog
+1 This is what will speed your query, @Jimmy8, not eliminating the UNION - although you may want to do that for simplicity, too. But for performance - the index is what you need.
Carl Manaster
A: 

Maybe you do not know about left outer join? Try:

SELECT  homes.home_id,
    address,
    city,
    state,
    zip,
    photo_id
    photo_url_dir 
FROM homes h
left outer join home_photos hp on h.home_id = hp.home_id
    AND primary_photo_group_id = home_photo_group_id
    AND home_photo_type_id = 2
RedFilter
Should I use this, which does not use a CASE statement, or @Quassnoi?
The @Quassnoi version will give you better performance if you have a lot of primary_photo_group_id = 0. If not that many then this code is simpler and more straight forward.
Paul Morgan