tags:

views:

1796

answers:

4

Hi,

I'm pulling a list of items from one table, on the basis of their being included in another table, like this:

select fruit.id, fruit.name from fruit, fruit_rating where fruit_rating.fruit_id=fruit.id group by fruit.name;

That works fine -- it basically produces a list of all the fruit that has been rated by someone. But now, I want to exclude all fruit that has been rated by one specific user, so I tried this:

select fruit.id, fruit.name from fruit, fruit_rating where fruit_rating.fruit_id=fruit.id and fruit_rating.user_id != 10 group by fruit.name;

That's ok, but not quite right. It shows all the fruit that have been rated by people other than 10, but if users 1 and 10 have both rated the same fruit, it still shows that one. Can anyone tell me how to construct a query that shows only the fruit that have NOT Been rated by user 10, regardless of who else has rated them?

Thanks!

Ben

+5  A: 
... WHERE fruit_rating.fruit_id=fruit.id 
      and fruit.id not in 
          (select fruit_rating.fruit_id 
             from fruit_rating 
            where fruit_rating.user_id = 10)
Mr. Brownstone
You beat me to it!
Noah
Yea here on SO speed matters :). But you wrote whole query so I guess it's also useful.
Mr. Brownstone
+3  A: 

I refined your query a bit to make it a little easier to read, and added a subquery to filter out all of the fruit that's been rated by user 10

select f.id, f.name 
from fruit f
inner join fruit_rating fr on
 fr.fruit_id = f.id 
where f.id not in (
    select id
    from fruit_rating
    where [user_id] = 10) 
group by fruit.name;
Noah
+1  A: 

One thing that isn't quite 100% clear to me: do you want all fruit that hasn't been rated by user 10, or just fruit that has been rated by other people but not by user 10? e.g. should fruit that has no ratings be included?

I think you want all fruit (including unrated), in which case Noah and Mr. Brownstone's answers aren't quite what you're after. If you remove the inner join to fruit_rating, and the now-needless group by, theirs will include unrated fruit. An alternative approach, which avoids the subselect, is

select f.id, f.name 
from fruit f
left join fruit_rating fr on
  (f.id = fr.fruit_id)
  and (fr.user_id = 10)
where
  (fr.user_id is null)

That is, do a left join (an optional join, if you like) onto fruit rating ONLY for user 10, and then only return rows where a match WASN'T found.

Cowan
Sorry: I guess my original question was worded a bit loosely. I was actually looking for what NOah and Mr Brownstone suggested -- i.e., I wanted only fruit that had been rated by people other than user 10. Thanks for the response, though, anyway.
Ben
+2  A: 

I read this differently from Cowan, and agree with Noah...

Find all fruit where: - User 10 did not rate it - Atleast one other user did rate it

However, in my experience using NOT IN can be quite slow. So, I generally prefer to filter using LEFT JOIN in the same way as Cowan. Here are a few different options, though I have not had time to test performance on large data sets...

SELECT
   [f].id,
   [f].name
FROM
   fruit           AS [f]
INNER JOIN
   fruit_rating    AS [fr]
      ON [fr].fruit_id = [f].id
GROUP BY
   [f].id,
   [f].name
HAVING
   SUM(CASE WHEN [fr_exclude].user_id = 10 THEN 1 ELSE 0 END) = 0


SELECT
   [f].id,
   [f].name
FROM
   fruit           AS [f]
INNER JOIN
   fruit_rating    AS [fr]
      ON [fr].fruit_id = [f].id
LEFT JOIN
   fruit_rating    AS [fr_exclude]
      ON [fr_exclude].fruit_id = [fr].fruit_id
      AND [fr_exclude].user_id = 10
GROUP BY
   [f].id,
   [f].name
HAVING
   MAX([fr_exclude].user_id) IS NULL


As this only works for one user, I would also consider making a table of "users to exclude" and LEFT JOIN on that instead...

SELECT
   [f].id,
   [f].name
FROM
   fruit           AS [f]
INNER JOIN
   fruit_rating    AS [fr]
      ON [fr].fruit_id = [f].id
LEFT JOIN
   excluded_users  AS [ex]
      AND [ex].user_id = [fr].user_id
GROUP BY
   [f].id,
   [f].name
HAVING
   MAX([ex].user_id) IS NULL


Or something much more long winded, but I suspect is the fastest on larger data sets with appropriate indexes...

SELECT
   [f].id,
   [f].name
FROM
   fruit           [f]
INNER JOIN
(
   SELECT
      fruit_id
   FROM
      fruit_rating
   GROUP BY
      fruit_id
)
   AS [rated]
      ON [rated].fruit_id = [f].id
LEFT JOIN
(
   SELECT
      [fr].fruit_id
   FROM
      fruit_rating    AS [fr]
   INNER JOIN
      excluded_users  AS [ex]
         ON [ex].user_id = [fr].user_id
   GROUP BY
      [fr].fruit_id
)
   AS [excluded]
      ON [rated].fruit_id = [excluded].fruit_id
WHERE
   [excluded].fruit_id IS NULL
GROUP BY
   [f].id,
   [f].name
Dems