views:

38

answers:

2

In my database, I have a keywords field that stores a comma-delimited list of keywords.

For example, a Shrek doll might have the following keywords:

ogre, green, plush, hero, boys' toys

A "Beanie Baby" doll ( that happens to be an ogre ) might have:

beanie baby, kids toys, beanbag toys, soft, infant, ogre

(That's a completely contrived example.)

What I'd like to do is if the consumer searches for "ogre" I'd like the "Shrek" doll to come up higher in the search results.

My content administrator feels that if the keyword is earlier in the list, it should get a higher ranking. ( This makes sense to me and it makes it easy for me to let them control the search result relevance ).

Here's a simplified query:

SELECT
p.ProductID         AS ContentID
, p.ProductName     AS Title
, p.ProductCode     AS Subtitle
, 100               AS Rank
, p.ProductKeywords AS Keywords
FROM Products AS p
WHERE FREETEXT( p.ProductKeywords, @SearchPredicate )

I'm thinking something along the lines of replacing the RANK with:

, 200 - INDEXOF(@SearchTerm)            AS Rank

This "should" rank the keyword results by their relevance

I know INDEXOF isn't a SQL command... but it's something LIKE that I would like to accomplish.

Am I approaching this the right way?

Is it possible to do something like this?

Does this make sense?

+1  A: 

Can I suggest another way?

What about have a linked table ProductKeywords:

ID_ProductKeyword(pk)
ProductID(int)
KeywordID(int)
Weight(int)

This expresses the above relationship: relationship between keyword and product, as well as how important the particular keyword is for that particular product (higher weight corresponds to higher index).

A side benefit is that you can update the weights dynamically based on how frequently the keyword gives the right product for the users. Or, you can add keyword associations easily if you find that you missed one (did they search for myers before remembering that Shrek is an ogre?)

My two cents.

John at CashCommons
Yeah, I was thinking of refactoring the keywords into its own table... but I'd prefer not to mess with the data structure as it's used in production code right now. I also want to make it easy as possible for my users to update without forcing them to think about weights and ranks. Though I could make this transparent to content admins in the GUI.
Atømix
+1  A: 

Based on what you have and your need not to modify existing structures, this, well, shows how lame SQL Server is at string manipulation, but it would work. Walking through the logic:

DECLARE
  @ProductKeywords varchar(100)
 ,@SearchPredicate varchar(10)

SET @ProductKeywords = 'The,quick,brown,fox,jumps,over'
SET @SearchPredicate= 'fox'

--  Where in the string your search value is
print charindex(@SearchPredicate, @ProductKeywords)

--  The string up through but not including your search string
print left(@ProductKeywords, charindex(@SearchPredicate, @ProductKeywords))

--  Remove the commas (your delimiter) from the above
print replace(left(@ProductKeywords, charindex(@SearchPredicate, @ProductKeywords)), ',', '')

--  This is how many characters are left
print len(replace(left(@ProductKeywords, charindex(@SearchPredicate, @ProductKeywords)), ',', ''))

--  This is how many delimiters you removed,
--  = the number of words (minus one) from the "first" the found word was,
--  = a weighting factor you can use
print charindex(@SearchPredicate, @ProductKeywords) - len(replace(left(@ProductKeywords, charindex(@SearchPredicate, @ProductKeywords)), ',', ''))

Replace @ProductKeyword with p.ProductKeywords and it should do the trick. (Please note that I have zero experience with the full-text query engine. It may or may not have an impact on this code.)

Philip Kelley
Thanks! I'm going to try this! I follow the logic and there are a few things I didn't think of in there.
Atømix