views:

63

answers:

2

I'm not much of a database guru so I need some help on a query I'm working on. In my photo community project I want to richly visualize tags by not only showing the tag name and counter (# of images inside them), I also want to show a thumb of the most popular image inside the tag (most karma).

The table setup is as follow:

  • Image table holds basic image metadata, important is the karma field
  • Imagefile table holds multiple entries per image, one for each format
  • Tag table holds tag definitions
  • Tag_map table maps tags to images

In my usual trial and error query authoring I have come this far:

SELECT * FROM

(SELECT tag.name, tag.id, COUNT(tag_map.tag_id) as cnt
FROM tag INNER JOIN tag_map ON (tag.id = tag_map.tag_id)
INNER JOIN image ON tag_map.image_id = image.id
INNER JOIN imagefile on image.id = imagefile.image_id 
WHERE imagefile.type = 'smallthumb'
GROUP BY tag.name
ORDER BY cnt DESC)

as T1 WHERE cnt > 0 ORDER BY cnt DESC

[column clause of inner query snipped for the sake of simplicity]

This query gives me somewhat what I need. The outer query makes sure that only tags are returned for which there is at least 1 image. The inner query returns the tag details, such as its name, count (# of images) and the thumb. In addition, I can sort the inner query as I want (by most images, alphabetically, most recent, etc)

So far so good. The problem however is that this query does not match the most popular image (most karma) of the tag, it seems to always take the most recent one in the tag.

How can I make sure that the most popular image is matched with the tag?

+4  A: 

You are looking for the group by 'having' clause, not nested selects!

SELECT tag.name, tag.id, COUNT(tag_map.tag_id) as cnt
  FROM tag 
 INNER JOIN tag_map 
    ON (tag.id = tag_map.tag_id)
 INNER JOIN image 
    ON tag_map.image_id = image.id
 INNER JOIN imagefile 
    on image.id = imagefile.image_id 
 WHERE imagefile.type = 'smallthumb'
 GROUP BY tag.name HAVING COUNT(tag_map.tag_id) > 0
 ORDER BY cnt DESC
lexu
Thank you. That looks like a much more efficient way of doing things. Mind you, it does not solve the original problem, it still does not match the image with the most karma. Any thoughts on that?
Ferdy
+3  A: 

This should be pretty close:

SELECT
  tag.id, 
  tag.name,
  tag_group.cnt,
  tag_group.max_karma,
  image.id, 
  imagefile.filename
  /* ... */
FROM
  tag
  /* join against a list of max karma values (per tag) */
  INNER JOIN (
    SELECT   MAX(image.karma) AS max_karma, COUNT(image.*) cnt, tag_map.tag_id
    FROM     image
             INNER JOIN tag_map ON tag_map.image_id = image.id
    GROUP BY tag_map.tag_id
  ) AS tag_group ON tag_group.tag_id = tag.id
  /* join against a list of image ids (per max karma value and tag) */
  INNER JOIN (
    SELECT   MAX(image.id) id, tag_map.tag_id, image.karma
    FROM     image
             INNER JOIN tag_map ON tag_map.image_id = image.id
    GROUP BY tag_map.tag_id, image.karma /* collapse >1 imgs with same karma */
  ) AS pop_img ON pop_img.tag_id = tag.id AND pop_img.karma = tag_group.max_karma
  /* join against actual base data (per popular image id) */
  INNER JOIN 
    image ON image.id = pop_img.id
  INNER JOIN
    imagefile ON imagefile.image_id = pop_img.id AND imagefile.type = 'smallthumb'

Basically, this is the ever-recurring "max-per-group" problem: How can I select the record that corresponds to the maximum/minimum value of a group?

And the general answer always is along the lines of: Select your group (tag_id, MAX(image.karma)) and then join your base data against these characteristics. There may be DBMS-specific proprietary extensions that take a different approach, for example using ROW_NUMBER()/PARTITION BY. However, these are not very portable and may leave you scratching your head when working with a DBMS that does not support them.

Tomalak
Awesome, thanks! Kudos to the rest too. All of your answers combined helped me build quite a cool feature.
Ferdy
@Ferdy: If you are using my suggested query, check that the following indexes exist. All tables: primary keys in `imagefile`: a composite index over `(image_id, type)`; in `image` one over `(karma)`.
Tomalak