ansaurus

Question

What's the optimal solution for tag/keyword matching?

Answer 1

A:

Well maybe something like the follwing:

select p.productId, p.name, r.rank
from products p inner join (
/* this inner select should bring in only products that have at least one keyword
=> shared with the requested product, and will count the actual number shared (for ranking)*/
    select related.productId, count(related.productId) as rank
    from
     products_keywords related inner join 
     products_keywords pk ON (pk.productId = @productId  AND related.keywordId = pk.keywordId)
    where related.productId <> @productId
    group by related.productId
) r on p.productId = r.productId
order by r.rank DESC /* added DESC (not in orignal solution, but needed to put higher ranked on top)*/

Now I seriously doubt that's an optimal sql statement, but it should get the job done. I can't verify it though since I just wrote it from scratch with no actual backing tables, or data to test against.

Josh 2009-03-05 15:44:05

if you wanted weighted keywords for above and you choose your weights to be a number between 0 > x >= 1, you can set the rank to (count(related.productId)*(sum(pk.weight)/sum(pk.weight) + 1)) as rankor [C * (w/w+1) = rank] where C is the count and w is the summed weight.

Josh 2009-03-05 17:44:07

Answer 2

+3 A:

My first shot would be something like:

SELECT
    P.product_id,
    COUNT(*)
FROM
    Product_Keywords PK1
INNER JOIN Product_Keywords PK2 ON
    PK2.keyword_id = PK1.keyword_id
INNER JOIN Products P ON
    P.product_id = PK.product_id
WHERE
    PK1.product_id = @product_id
GROUP BY
    P.product_id
ORDER BY
    COUNT(*) DESC

The join of Product_Keywords to Product_Keywords (PK2 to PK1) might be rough, so I can't speak to performance. This is where I would start though and then look at optimization.

One thing to consider, as a follow-up to Assaf's comment, is that you could add a "weight" to the Product_Keywords and SUM(PK1.weight) + SUM(PK2.weight) for ranking. Just a thought.

EDIT: To elaborate on the weighting... you may decide that you want to allow keywords to be weighted. The actual method used to determine the weighting would be a business decision though, so I can't really give you too much guidance there.

As an example though, this question is about "programming", "keyword matching", and "SQL". Programming is pretty generic, so if two questions had that in common it still might not mean that they are that related so maybe you only weight it as 1. SQL is a little more specific, so that you might weight as a 5. Keyword matching is both the main focus of the question AND it's pretty specific, so you might weight that with a 10.

This is just an example of course and as I said, the exact determination of the weights as well as how you score it are dependent on the specific business. You might decide that matching the number of keywords is more important than the weights so maybe the weighting is only used as a tie-breaker, etc. HTH.

Tom H. 2009-03-05 15:56:19

Can you please elaborate a little bit about the weight concept? How should I weight keywords?

muerte 2009-03-05 16:48:01

ansaurus

tags:

views:

answers:

What's the optimal solution for tag/keyword matching?

related questions