tags:

views:

9

answers:

0

Currently I'm using fulltext search for a site and I seem to find some odd effects of it (maybe it's normal; this is my first attempt at using it). When I have a search query, say query: 'the first animal' and I have data, title: 'the first animal', keywords: 'animal, amoebe, sea'. Those title and keywords fields are indexed Fulltext(title, keywords).

My query, I rewrite to: '+the* +first* +animal*' to get maximum relevant results and I'm using Boolean to match this in a fulltext search like so;

select content.*, match(title, keywords) against('+animal* +first* +the*' IN BOOLEAN MODE) AS match_strength from content HAVING match_strength> 0.005 ORDER BY match_strength DESC LIMIT 0,16

Nicely enough, this returns no results.

After some playing around, I notice that fulltext seems to ignore 'first' and 'the', as searching;

select content.*, match(title, keywords) against('+first*' IN BOOLEAN MODE) AS match_strength from content HAVING match_strength> 0.005 ORDER BY match_strength DESC LIMIT 0,16

delivers nothing. Same with +the*, except if the is in a worth like 'theanimal' in title or keywords.

Ofcourse, this really sucks. Because searching for 'the* first* animal' or 'the first animal' is not an option; too many results if you DID pick proper words that fulltext does not ignore.

Now i'm trying to figure out 3 things;

1) what's the list of words fulltext does not consider relevant and how can I influence that? I know first, the, an, a and a lot more are words it ignores. If I have a complete list I can filter them out of my query before querying, thus avoiding empty resultset on trivially (and correct) queries while still delivering the most relevant results. 2) how can I give the title field results preference over the keywords field? If the words in the title match, it is far more relevant than if they match in keywords 3) how can I, preferably in one query, combine different match query results in order (if at all possible), for instance:

I create different queries:

'+the +first +animal'
'+the* +first* +animal*' 
'+animal +first'
'+animal* +first*'
etc

and I want their result sets to be ordered as the queries above, so results (if any) from '+the +first +animal' go first, then results (if any) from '+the* +first* +animal*' etc.

Hopefully this is possible, but behind completely trivial stuff, I couldn't find anything on Google.

Edit: 1) I found that (sorry, stupid of me) ; http://dev.mysql.com/doc/refman/5.0/en/fulltext-stopwords.html. So that's solved.