views:

74

answers:

7

I am running a simple mysql full-text query that searches for users on my site based off of their "display name". The query example is below - in this example we are searching 'lancaster toy store':

SELECT MATCH(`display_name`) AGAINST ('lancaster toy store') as `rel`
WHERE MATCH(`display_name`) AGAINST ('lancaster toy store')
ORDER BY `rel` DESC

It works well in that it pulls up a good amount of results, but an example of the results would be:

  1. charlotte toy store
  2. toy store on broadway
  3. arizona toy stores
  4. toy store of lancaster
  5. east coast toys

As you can see, my problem is that people are searching for 'lancaster toy store', and the obvious best result is coming up near the middle or bottom.

I am using the porter-stemmer technique, as well.

Any ideas how to get more accurate results?

UPDATE

Here's the real query (the actual search term is 'lancaster restore'):

SELECT `id`,
       MATCH (`display_name`) AGAINST ('lancast* restor*' IN BOOLEAN MODE)
           AS `RELEVANCY`
FROM `users`
WHERE `status` = 'active'
&& MATCH (`display_name`) AGAINST ('lancast* restor*' IN BOOLEAN MODE)
ORDER BY `RELEVANCY` DESC
LIMIT 25

and here are the results:

  1. Habitat for Humanity of Orange County - ReStores
  2. ReStore 15 Fourth Street Dover NH
  3. Morris Habitat for Humanity ReStore
  4. Habitat ReStore Lima Ohio
  5. Habitat for Humanity Charlotte ReStore
  6. ReStore Montgomery County
  7. Dayton Ohio Habitat for Humanity ReStore
  8. ReStore
  9. Lancaster Area Habitat for Humanity ReStore
A: 

I don't know what the porter-stemmer technique is, but using your sample data and query with a standard MySQL fulltext index, the only result that should be returned is #4:

4. toy store of lancaster

I noticed your sample query is missing a FROM clause, so I assume that is not the exact query you are running. Is it missing anything else? Perhaps you are using BOOLEAN MODE in your query? If you are using BOOLEAN MODE, that would explain the extra results, but result #4 should be at the top of the list since it has all 3 of the words.

Can you provide your exact query?

Ike Walker
sorry, yeah its not the exact query, i have a full class building the query dynamically, so i had to just make an example up....ill try and put the exact query in.
johnnietheblack
oh, and porter-stemmer is an algorithm that trims the "unnecessary" parts off the words....so that if you search for "toys", it doesn't eliminate results for "toy"...beacuse its trims off the "s". another example....it trims "johnnie" to "john", so that i wouldn't get left out ;) you'll see it in the update above:
johnnietheblack
Thanks for the full query. I see you are using boolean mode as I expected, so it makes sense that your search returns results with 1 or more of the search words. But you are order by relevance, so results with 2 matching words *should* be above those with only 1 matching word. Is that not happening?
Ike Walker
correct, that is not happening...in this case, the following appear before the correct result:
johnnietheblack
results posted in question above :)
johnnietheblack