views:

37

answers:

1

What I mean saying true "related by tags" list?

Let's imagine that article have 3 tags: A, B, C. True "related by tags" articles for this item will be articles fistly having A, B, C tags, then (A, B), (A, C), (B, C) etc.

table: tags

tag_id
tag_title
tag_nicetitle

table: tags2articles

article_id
tag_id

Using this tables structure is too difficult to calculate true "related by tags".

We can add one more table containing article_id and it's md5(A,B,C). Before hashing we should sort tags by alphabet.

table: article_tags_hashed

id
article_id
md5
count

This table will help us to find articles containing exact set of tags (A,B,C), but it won't help to find articles containing only (A, B), (A, C), (B, C)

What is the best practice?

PS: Sorry for my english, it's pretty bad.

+1  A: 

I don't think you need article_tags_hashed because you can group and count the results when you
query the tags2articles.

Example:

select article_id, count(article_id) as tagcnt from tags2articles
where tag_id in (...)
group by article_id
order by tagcnt desc

The articles which contain the most tags will be placed first.

Nick D
Nice query, but will it work fast with huge tables?
Kirzilla
@Kirzilla, if the `id`'s are keys then yes, it should be fast. You can even set a limit for the results. ie you can *ask* to get at most 15 articles.
Nick D