views:

26

answers:

2

I'm trying to figure out how to order items with matching tags by the number of tags that match.

Let's say you have three MySQL tables:

  • tags(tag_id, title)
  • articles(article_id, some_text)
  • articles_tags(tag_id, article_id)

Now let's say you have four articles where:

article_id = 1 has tags "humor," "funny," and "hilarious."

article_id = 2 has tags "funny," "silly," and "goofy."

article_id = 3 has tags "funny," "silly," and "goofy."

article_id = 4 has the tag "completely serious."

You need to find all articles related to article_id = 2 by at least one matching tag, and return the results in order of the best matches. In other words, article_id = 3 should come first, with article_id = 1 second, and article_id = 4 should not show up at all.

Is this something that's doable in SQL queries or alone, or is this better suited for something like Sphinx? If the former, what kind of query should be done, and what sort of indexes should be created for the most performant results? If the latter, please do expand.

+1  A: 

Try something like this:

select article_id, count(tag_id) as common_tag_count
from articles_tags 
group by tag_id
where tag_id in (
    select tag_id from articles_tags where article_id = 2
) and article_id != 2
order by common_tag_count desc;

Syntax may need a little tweaking for MySQL.

or this one that actually works: ;-)

SELECT at1.article_id, Count(at1.tag_id) AS common_tag_count
FROM articles_tags AS at1 INNER JOIN articles_tags AS at2 ON at1.tag_id = at2.tag_id
WHERE at2.article_id = 2
GROUP BY at1.article_id
HAVING at1.article_id != 2
ORDER BY Count(at1.tag_id) DESC;
Andrew Cooper
The second syntax is fantastic and worked exactly the way I needed. Thanks so much!
Josh Smith
+1  A: 

Something resembling:

SELECT a.* 
FROM articles AS a 
INNER JOIN articles_tags AS at ON a.id=at.article_id
INNER JOIN tags AS t ON at.tag_id = t.id
WHERE t.title = 'funny' OR t.title = 'goofy' OR t.title = 'silly' AND a.id != <article_id>
GROUP BY a.id
ORDER BY COUNT(a.id) DESC

With just the usual indexes, assuming articles_tags has PK of (article_id, tag_id), and an index on tags.title

timdev