+2  A: 

First let's state the problem. We want all the films with the highest rating for each category. Then, of those, we want the lowest price.

First get the highest ratings

SELECT * FROM Films 
INNER JOIN 
(SELECT Max(Rating) as Rating, Category
  FROM Films AS FM1 INNER JOIN Category AS C1 ON C1.CategoryId = FM1.CategoryId
  GROUP BY Category
) x on Films.Rating = x.Rating and Films.Category = x.Category

Now, from that, get the cheapest price

  SELECT * FROM Films INNER JOIN
    (SELECT Min(DVDPrice), x.Rating, Category FROM 
      (SELECT * FROM Films INNER JOIN 
        (SELECT MAX(Rating) as Rating, Category
          FROM Films AS FM1 INNER JOIN Category AS C1 ON C1.CategoryId = FM1.CategoryId
          GROUP BY Category
        ) x on Films.Rating = x.Rating and Films.Category = x.Category
      )
      WHERE DVDPrice IS NOT NULL
      GROUP BY Category, DVDPrice
    ) y on Films.Rating = y.Rating and Films.Category = y.Category and Films.DVDRating = y.DVDRating
Russell Steen
hi - it's hard to check since you've made some typos and thus I can't be sure what you've actually meant to write, but I don't think your query works
carewithl
point out the typos and I'll correct. I'm blind today it seems
Russell Steen
hi - I've edited my initial post with your query a bit rewriten ( see EDIT - replying to Russel... )
carewithl
My apologies for not following up more. Life took over for a couple of days! ;)
Russell Steen
+2  A: 

what you want is:
-----------------
for each category, retrieve a film that meets the following 2 conditions:
_condition1:__rating= max rating in that category_
_condition2:__price= min price in that category for films verifying condition 1_

--> in other terms it's equivalent to order films by Rating Desc then DVDPrice Asc for each category and take the first one.

1 solution is:

SELECT FilmName, Rating, DVDPrice, Category
FROM Films FM1 INNER JOIN Category AS C1 ON C1.CategoryId = FM1.CategoryId
WHERE FM1.FilmId = (SELECT TOP 1 FilmId
                      FROM Films AS FM2
                     WHERE FM2.CategoryId = FM1.CategoryId
                  ORDER BY Rating DESC, DVDPrice)

OR:

SELECT FM.FilmName, FM.Rating, FM.DVDPrice, C1.Category
  FROM (SELECT FM0.*, ROW_NUMBER() over (ORDER BY Rating DESC, DVDPrice) rank
          FROM Films FM0) FM 
INNER JOIN Category AS C1 ON C1.CategoryId = FM.CategoryId
INNER JOIN (SELECT FM1.CategoryId, MIN(FM1.rank) rank
              FROM (SELECT CategoryId,
                           ROW_NUMBER() over (ORDER BY Rating DESC,DVDPrice) rank
                  FROM Films) AS FM1
        GROUP BY CategoryId) FM2
 ON FM.CategoryId = FM2.CategoryId
AND FM.rank = FM2.rank

with your data, I've done some tests and it seems that the following query is better than the 2 above:

SELECT FM.*, C.Category
FROM (SELECT FM1.CategoryId, MAX(FM1.FilmId) FilmId
     FROM Films FM1
     WHERE NOT EXISTS (SELECT NULL 
                                FROM Films AS FM2
                               WHERE FM2.CategoryId = FM1.CategoryId
                                 AND (FM1.Rating < FM2.Rating 
                                      OR (    FM1.Rating = FM2.Rating 
                                          AND FM1.DVDPrice > FM2.DVDPrice)
                                     )
                          )
      GROUP BY FM1.CategoryId) FF
INNER JOIN Films FM on FM.FilmId = FF.FilmId
                   AND FM.CategoryId = FF.CategoryId
INNER JOIN Category AS C1 ON C1.CategoryId = FM.CategoryId
najmeddine
Won't this just the ONE result, not one per category?
Russell Steen
No, it will return a film only if it's the top of the subquery which is limited to a single category --> top film (max rating, min price) for each category.
najmeddine
what procedure did you use to figure out the result? I assume you’ve broken down the problem into simpler steps, but how exactly did you go about it?
carewithl
I added an explanation about how I analysed your problem.
najmeddine
but how do you manage not to get lost, when there are several inner queries and you need to figure out which columns should particular inner query select ( example of this would be a query posted by Russel Steen - I imagine that query is considered a rather simple, but even there one can get confused which columns a subquery should select - uh! )?
carewithl
a subquery is to be seen as a table built on the fly to contain aggregated/calculated data used by the outer query. To understand a query with many subqueries start from the deepest subquery, try replacing it with the result until reaching the main query. To build and use a subquery, first make sure the data calculated there could not be found more easily (ex.: just by using the real tables+joins) then imagine what calculated/aggregated data you want then write it and consider it as a table.
najmeddine
What do you mean by “try replacing it with the result until reaching the main query.” ?
carewithl
+1  A: 

1) Yes, the second query you give looks better. But I give +1 to @Russell Steen's solution because it avoids the use of correlated subqueries.

This is a variation of the greatest-n-per-group problem that I see frequently on SO. Here's another possible solution:

SELECT f.*
FROM Films f
LEFT OUTER JOIN Films p
 ON (f.CategoryId = p.CategoryId AND f.DVDPrice > p.DVDPrice)
LEFT OUTER JOIN Films r
 ON (f.CategoryId = r.CategoryId AND f.DVDPrice = r.DVDPrice AND f.Rating < r.Rating)
WHERE p.CategoryId IS NULL AND r.CategoryId IS NULL;

The explanation is that we try to find a film "p" in the same category, with a lower price. When we have found none, p.* will be NULL because that's how outer joins work. When there are no dvd's with a lower price, we've found the one with lowest price.

We further try the same trick to find a film "r" with the highest rating. This time we restrict to films in the same category and with the same price (that is, the lowest price) as the film f. Otherwise we'd unintentionally find the film with the highest rating in the category, even if it isn't cheap.

You can also reverse the order of the joins, first finding the highest rating and then finding the lowest price among those with the higest rating. It depends on what you place at a greater priority -- low price or high rating. No matter what solution you use, you have to make a decision about this priority.

2) The other query you tried doesn't work because the condition you use in the subquery doesn't eliminate any of the wrong rows of the FT2 subquery. It's a "Green Eggs and Ham" problem: whether on a train or on a plane, on a boat or on a goat, you've still got green eggs and ham included in the meal.


update: Okay, thanks for the sample data. When you first asked the question, you didn't include the information that some films could be ineligible because they aren't available on DVD and have a NULL in the DVDPrice column. Here's an updated query using my technique that returns the correct films, one per category, excluding films that aren't available on DVD, with the lowest price and highest rating:

SELECT f.FilmName, f.Rating, f.DVDPrice, f.CategoryId
FROM Films f
LEFT OUTER JOIN Films p ON (f.CategoryId = p.CategoryId
  AND p.AvailableOnDvd = 'Y' AND f.DVDPrice > p.DVDPrice)
LEFT OUTER JOIN Films r ON (f.CategoryId = r.CategoryId
  AND r.AvailableOnDvd = 'Y' AND f.DVDPrice = r.DVDPrice AND f.Rating < r.Rating)
WHERE f.AvailableOnDvd = 'Y' AND p.CategoryId IS NULL AND r.CategoryId IS NULL
ORDER BY f.CategoryId;

Output:

+-------------------------+--------+----------+------------+
| FilmName                | Rating | DVDPrice | CategoryId |
+-------------------------+--------+----------+------------+
| The Maltese Poodle      |      1 |     2.99 |          1 |
| Third                   |      7 |    10.00 |          2 |
| Nightmare on Oak Street |      2 |     9.99 |          3 |
| Planet of the Japes     |      5 |    12.99 |          4 |
| Soylent Yellow          |      5 |    12.99 |          5 |
| Sense and Insensitivity |      3 |    15.99 |          6 |
+-------------------------+--------+----------+------------+

This differs from your result in category 6, because Sense and Insensitivity in your sample data is the only film that is available on DVD. 15 Late Afternoon is not available, even though it has a non-null value for DVDPrice. If I change it to AvailableOnDvd='Y' then 15 Late Afternoon is chosen instead of the other film.


Regarding your question about how did I solve this, it's a variation of a common question in SQL, which I have tagged the "greatest-n-per-group" question. You want the query to return every film f such that no film exists with a lower DVDPrice in the same category. I solve with an outer join to p and if no matches are found in p then f must have the lowest price in that category. That's the common solution.

Your added twist in this problem is that you have another attribute to filter on. So given the film (or films in the case of ties) with the lowest price, you want the one with the highest rating. The technique is the same, to use an outer join to r where the category and price is equal, and the rating is higher. When no such films are found with a higher rating, then f must have the highest rating for a given category and price.

I'm going to add a tag to your question greatest-n-per-group so you can follow it and view other SQL questions that are solved with the same technique.

Bill Karwin
hi - your query doesn't return the correct results
carewithl
If I may also ask – what procedure do you use to figure out the result? I assume you’ve broken down the problem into simpler steps, but how exactly did you go about it?
carewithl
My query gives the same results as your second example in your original question. It would help me and others answer your question if you provide some examples of the correct results given hypothetical input.
Bill Karwin
Hi - I've updated my initial post with some examples of results produced by mine and by your queries
carewithl
If I may ask just one more question - I don’t understand why if we move f.AvailableOnDvd = 'Y' from WHERE clause into FROM clause, do we get wrong results … meaning the query also selects rows where f.DVDPrice has a NULL value? - I've edited my initial post with modified query
carewithl
By moving the condition into the join to `p`, the condition only applies when matching the rows in `p`. Whereas when doing the join to `r`, rows of `f` are allowed where AvailableOnDvd='N'.
Bill Karwin
I understand that. But I don’t understand why moving f.AvailableOnDvd = 'Y' condition into p clause causes a query to also select rows with DVDPrice equal to NULL? As far as I can tell, this particular query should produce the same results, regardless of whether f.AvailableOnDvd = 'Y' condition is inside FROM clause or inside WHERE clause?!
carewithl
Conditions in a JOIN clause restrict the matches for that join only. But you have two joins, and the condition in the first join doesn't apply to the second join. So rows in f with no DVDPrice are permitted to be compared to rows in r, and then they are not excluded by the WHERE clause since you've taken the condition out of that clause.
Bill Karwin
I get it. I forgot the fact that ( assuming we put the f.AvailableOnDvd = 'Y' condition into FROM join ) when table f is left joined to table p, that only those f’s rows which do find a matching row(s) in p, are guaranteed to have f.AvailableOnDvd set to 'Y', while those f’s rows that didn’t find any matching rows in p, will be returned even if f.AvailableOnDvd is not set to ‘Y’. Thanx for helping me mate
carewithl