views:

39

answers:

4

I need to get data from 3 different tables. I know the basics about JOINs but when it comes to more complicated queries like getting data from 3 or more tables using JOIN I get a little confused and I just start playing with the queries writing what make sense to me, like the next one:

SELECT movies.imdbID,
  movies.title,
  movies.year,
  movie_ratings.votes,
  movie_ratings.total_value,
  movie_ratings_external.votes,
  movie_ratings_external.total_value
FROM movies, movie_ratings_external
LEFT JOIN movie_ratings ON movie_ratings.imdbID = movie_ratings_external.imdbID
WHERE movies.imdbID = movie_ratings_external.imdbID
ORDER BY movie_ratings.votes DESC, movie_ratings_external.votes DESC
LIMIT 30

This query works. I get the selected fields from the correct tables and ordered the right way, but I think I'm mixing up things (like regular querying two tables and JOINing between two tables) and I'm sure theres a better/more efficient way to accomplish the same purpouse.

Any DB geek available?

+1  A: 

You are doing 2 joins. However you are mixing your syntaxes which reduces readability.

Either do

SELECT movies.imdbID,
  movies.title,
  movies.year,
  movie_ratings.votes,
  movie_ratings.total_value,
  movie_ratings_external.votes,
  movie_ratings_external.total_value
FROM movies
LEFT JOIN movie_ratings_external ON movies.imdbID = movie_ratings_external.imdbID
LEFT JOIN movie_ratings ON movie_ratings.imdbID = movie_ratings_external.imdbID
ORDER BY movie_ratings.votes DESC, movie_ratings_external.votes DESC
LIMIT 30

or like this

SELECT movies.imdbID,
  movies.title,
  movies.year,
  movie_ratings.votes,
  movie_ratings.total_value,
  movie_ratings_external.votes,
  movie_ratings_external.total_value
FROM movies, movie_ratings_external, movie_ratings 
WHERE movies.imdbID = movie_ratings_external.imdbID
  AND movie_ratings.imdbID = movie_ratings_external.imdbID
ORDER BY movie_ratings.votes DESC, movie_ratings_external.votes DESC
LIMIT 30

I'm also not sure why you are left joining. If a rating might not exist for a particular movie you would want to join towards the movie table.

If that is your intention your query should look like so.

SELECT movies.imdbID,
  movies.title,
  movies.year,
  movie_ratings.votes,
  movie_ratings.total_value,
  movie_ratings_external.votes,
  movie_ratings_external.total_value
FROM movies
LEFT OUTER JOIN movie_ratings_external ON movies.imdbID = movie_ratings_external.imdbID
LEFT OUTER JOIN movie_ratings ON movies.imdbID = movie_ratings.imdbID
ORDER BY movie_ratings.votes DESC, movie_ratings_external.votes DESC
LIMIT 30
Biff MaGriff
A: 

The first syntax you're using

FROM tableA, tableB
WHERE tableA.abc = tableB.def

is another syntax for a regular inner join and indeed the only syntax you can use on some DBMSes such as Informix - the two ought to be completely equivalent. I think it's confusing to mix and match the syntaxes, though, so I'd pick one and stick to it. The real test would be to check the mysql query plan before and after switching, but I would be very surprised if it made any difference at all.

Efficiency: as long as you've got indexes on all the right columns - i.e. the imdbID column on all three tables - I can't see anything you can do to improve this. I don't think you need an index on the columns you're ordering on too but that might be worth testing if you're looking for improvements.

Rup
A: 

Nothing wrong with the query. But I prefer something like the following - for the sake of readability:

SELECT  Movie.imdbID 
,       Movie.title 
,       Movie.year 
,       Rating.votes 
,       Rating.total_value 
,       ExternalRating.votes 
,       ExternalRating.total_value 

FROM    movies Movie 
    LEFT JOIN
        movie_ratings_external ExternalRating

 ON     Movie.imdbID = ExternalRating.imdbID 

    LEFT JOIN movie_ratings Rating

ON      Rating.imdbID = ExternalRating.imdbID 

ORDER   BY 
        Rating.votes DESC
,       ExternalRating.votes DESC 
Noel Abrahams
+1  A: 

Modified your query just a tad bit. Everything else was pretty much spot on.

I am not exactly sure what you call a "regular query" but Joins are pretty much the very basic part of a regular query.

In your old query, this part FROM movies, movie_ratings_external was basically a CROSS JOIN. Replaced it with a JOIN (i used a LEFT JOIN assuming that you always wanted to return all the movies but basic intention was that the CROSS JOIN with the filtering in the WHERE is a LOT more inefficient because you are manipulating more ROWS - especially unnecessary ones)

Think of it this way. Every Join statement in your SELECT returns a set of row which are then JOINED with the next table in the JOIN. The idea should be to try an filter rows progressively with each join as far as possible so that you arent getting unnecessary tuples. Thats where the ON statements come in.

Removed

SELECT 
  movies.imdbID,
  movies.title,
  movies.year,
  movie_ratings.votes,
  movie_ratings.total_value,
  movie_ratings_external.votes,
  movie_ratings_external.total_value
FROM movies 
LEFT OUTER JOIN movie_ratings_external
     ON movies.imdbID = movie_ratings_external.imdbID
LEFT JOIN movie_ratings 
     ON movie_ratings.imdbID = movie_ratings_external.imdbID
ORDER BY movie_ratings.votes DESC, movie_ratings_external.votes DESC
LIMIT 30

Hope that helps!!

InSane