views:

618

answers:

7

I've never done searching from MYSQL before, but I need to implement a search. I have three tables, 'articles', 'articles_tags', and 'tags'.

'articles' holds the first thing I would like to search on, the 'title' field.

'articles_tags' is a pivot table which relates 'articles' and 'tags' together. 'articles_tags' has two fields: 'articles_id' and 'tag_id'.

'tags' holds the second thing I would like to search on, the 'name' field.

My problem is, I need a way to search the 'title' field, and each of the tags that relate to that article ('tags.name') and return a relevancy (or sort by relevancy) for article.

What would be a good way to implement this? I'm pretty sure it can't be done from just one query so two queries, and then 'mixing' the relevancies together, would be ok.

Thanks.

Edit: Forgot to say, if I could give more weighting to matching a tag than matching a word in the title, that would be awesome. I'm not really asking for anyone to write the thing, but give me some direction. I'm a bit of a newbie in both PHP and MySQL.

A: 

Funny it is the 3rd question about pretty much the same problem I see in 2 days, check out these two posts: 1, 2

dimus
I looked at those two but fail to see how they relate to my problem.
A: 

This quick demo query is far from optimized but should be a good starting point

SELECT * FROM
(SELECT a.id, a.title, 
MATCH (a.title) AGAINST ('$s_search_term') AS title_score,
SUM(MATCH (t.name) AGAINST ('$s_search_term')) AS tag_score
FROM articles AS a
LEFT JOIN articles_tags AS at
ON a.id = at.article_id
LEFT JOIN tags AS t
ON t.id = at.tag_id
WHERE MATCH (a.title) AGAINST ('$s_search_term') 
OR MATCH (t.name) AGAINST ('$s_search_term')
GROUP BY a.id) AS table1
ORDER BY 2*tag_score + title_score DESC

You may want to normalize tag_score by dividing it by COUNT(t.id). Sorry but it's easier to give the query than to explain how to make it.

james.c.funk
A: 

Starting from the answer given by @james.c.funk but making some changes.

SELECT a.id, a.title, 
  MATCH (a.title) AGAINST (?) AS relevance
FROM articles AS a
LEFT OUTER JOIN (articles_tags AS at
  JOIN tags AS t ON (t.id = at.tag_id AND t.name = ?))
  ON (a.id = at.article_id)
WHERE MATCH (a.title) AGAINST (? IN BOOLEAN MODE) 
ORDER BY IF(t.name IS NOT NULL, 1.0, relevance) DESC;

I assume you want tag matches to match against the full string, instead of using a fulltext search.

Also using one left outer join instead of two, because if a join to articles_tags is satisfied, then surely there is a tag. Put the tag name comparison inside the join condition instead of in the WHERE clause.

The boolean mode makes MATCH() returns 1.0 on a match, which makes it useless as a measure of relevance. So do an extra comparison in the select-list to calculate the relevance. This value is between 0.0 and 1.0. Now we can make a tag match sort higher by treating it as having relevance of 1.0.

Bill Karwin
A: 

You might want to look into sphinx, http://www.sphinxsearch.com/

devians
A: 

Here is how I have done this in the past. It looks slow, but I think you will find it is not.

I added a little complexity to show what else can easily be done. In this example an article will get 1 point for a partial title match, 2 points for a partial tag match, 3 points for an exact tag match, and 4 points for an exact title match. It then adds those up and sorts by the score.

SELECT
  a.*,
  SUM(
    CASE WHEN a.title LIKE '%keyword%' THEN 1 ELSE 0 END
    +
    CASE WHEN t.name LIKE '%keyword%' THEN 2 ELSE 0 END
    +
    CASE WHEN t.name = 'keyword' THEN 3 ELSE 0 END
    +
    CASE WHEN a.title = 'keyword' THEN 4 ELSE END
  ) AS score
FROM article a, articles_tags at, tags t
WHERE a.id = at.article_id
AND at.tag_id=t.id
AND (a.title LIKE '%keyword%' OR t.name LIKE '%keyword%')
GROUP BY a.id
ORDER BY score;

NOTES: This will not return articles without tags. I used simple joins to reduce the noise in the query and highlight just what is doing the scoring. To include articles without tags just make the joins left joins.

Jackson Miller
+2  A: 

Is it worth at this time, recommending that you look at offloading the job of search to something that is actually written just for that purpose?

In our products, we use MySQL to store data, but index all of our data with Lucene (via Solr - but that's irrelevant).

It's worth giving it a look into, because it's relatively simple to set up, it's very powerful and it's a lot easier than trying to manipulate the database into doing what you want.

Sorry this isn't a direct answer to the question, I just feel that this kind of thing is always worth mentioning in this scenario :)

Rob Ashton
A: 

Your example is a great one but is only acceptable for one keyword searches, what if i want to search 'php programming' and i have a tag named 'programming' and an article with title 'php'.

In fact I'll explain it on my case: I have a company type (like plumber) which is a tag and a Company information (name, town, zip) which is a database entry. So how would I solve that if someone was searching for 'plumbers 1234' (1234 is zip which is part of the company entry while 'plubmers' is a tag name).

Thank you

Tomaz Zaman