views:

203

answers:

2

I'm running into my own limits of MySQL query skills, so I hope some SQL guru can help out on this one. The situation is as follow:

I have images that can be tagged. As you might expect this is stored in three tables:

  • Image
  • Tag
  • Tag_map (maps images to tags)

I have a SQL query that calculates the related tags based on a tag id. The query basically checks what other tags were used for images for images using that tag. Example:

Image1 tagged as "Bear"
Image2 tagged as "Bear" and "Canada"

If I throw "Bear" (or its tag id) at the query, it will return "Canada". This works fine. Here's the query:

SELECT tag.name, tag.id, COUNT(tag_map.id) as cnt
FROM tag_map,tag
WHERE tag_map.tag_id = tag.id AND tag.id != '185' AND tag_map.image_id IN

    (SELECT tag_map.image_id FROM tag_map INNER JOIN tag ON tag_map.tag_id = tag.id WHERE tag.id = '185')

GROUP BY tag_map.id LIMIT 0,100

The part I'm stuck with is the count. For each related tag returned, I want to know how many images are in that tag. Currently it always returns 1, even if there are for example 3. I've tried counting different columns all resulting in the same output, so I guess there is a flaw in my thinking.

+1  A: 

Some food for thought

  • I noticed you use id in your tag & image table and tablename_id in your tag_map table. Each his own offcourse, but I found it to be much easier if an id is named the same everywhere. I would rename the id's in tag & image to tag_id & image_id respectively.
  • It seems your id's are character strings. I've taken the liberty to use integers in the examples.

The following example uses SQL Server. It should not be to hard to adjust the SQL Statement to MySQL.

Test data

DECLARE @tag TABLE (id INTEGER, tag VARCHAR(32))
DECLARE @image TABLE (id INTEGER, image VARCHAR(32))
DECLARE @tag_map TABLE (image_id INTEGER, tag_id INTEGER)

INSERT INTO @tag
SELECT 185, 'Bear' 
UNION ALL SELECT 186, 'Canada'

INSERT INTO @image
SELECT 1, 'image1'
UNION ALL SELECT 2, 'image2'

INSERT INTO @tag_map
SELECT 1, 185
UNION ALL SELECT 2, 185
UNION ALL SELECT 2, 186

SQL Statement

SELECT  t.tag
        , t.id
        , cnt = (SELECT COUNT(*) FROM @tag_map WHERE tag_id = t.id)
FROM    @tag_map m
        INNER JOIN @tag t ON t.id = m.tag_id
        INNER JOIN (
          SELECT  m.image_id
          FROM    @tag_map m
          WHERE   m.tag_id = 185
        ) i ON i.image_id = m.image_id
WHERE   t.id <> 185
Lieven
Thank you for the detailed response. I tweaked your query to MySQL syntax and changed the column names to those used in my tables:SELECT t.name, t.id, COUNT(*)FROM tag_map as m INNER JOIN tag as t ON t.id = m.tag_id INNER JOIN ( SELECT m.image_id FROM tag_map as m WHERE m.tag_id = 185 ) as i ON i.image_id = m.image_idWHERE t.id <> 185GROUP BY t.name, t.idUnfortunately, it still returns 1, where it should return 3. I think it is counting the n# of related tags, but I want to count the # of images that have that related tag
Ferdy
@Ferdy, I have changed the query. Can you verify the results?
Lieven
It works now, thank you so much. Given the other answer, I now have two solutions :) I'm going to flip a coin to decide who wins. Thanks both.
Ferdy
oh those damn coins <g>
Lieven
+2  A: 

Your code not working right because you select just images "associated with a choosen tag", but not images, "associated with tags associated with image associated with choosen tag" (I hope, I used correct recursion depth :) ).

You can do this with subselects:

SELECT tag.id, tag.name, COUNT(DISTINCT tag_map.image_id) as cnt
  FROM tag_map, tag
 WHERE tag_map.tag_id = tag.id
   AND tag.id != 185
   AND tag_map.tag_id IN (
     SELECT sub1.tag_id FROM tag_map AS sub1 WHERE sub1.image_id IN (
       SELECT sub2.image_id FROM tag_map AS sub2 WHERE sub2.tag_id = 185
     )
   )
GROUP BY tag.id, tag.name;
dchekmarev
Thank you sir, that works perfectly. You rock, and so does SO.
Ferdy
You shouldn't flip a coin to decide, you should use mysql 'explain' command to check which query works better on your tables.For example, on my tables configuration, Lieven's query is one subquery longer.
dchekmarev
@Ferdy - I don't have mysql running but SQLServer shows a Query Cost of 64% for this solution, 36% for my solution... go figure.
Lieven
(That should have been @dchkemarev).
Lieven
I'd give you both a coin if that was possible guys. Rest assured you both helped me tremendously and that is the point.
Ferdy
@Ferdy - I wasn't trying to convince you you should reverse your choice but the statement implied that this solution performs best. At least in SQL Server, that is not true.
Lieven