SQL Server Weighted Full Text Search

views:

651

answers:

+2 Q:

SQL Server Weighted Full Text Search

Currently I have a table that I search upon 4 fields, FirstName, LastName, MiddleName, And AKA's. I currently have a CONTAINSTABLE search for the rows and it works. Not well but it works. Now I want to make the First Name weighted higher and middle name lower.

I found the command ISABOUT but that seems pretty worthless if I have to do it by word not column (hopefully I understood this wrong). This is not an option if its by word because I do not know how many words the user will enter.

I found the thread here that talks about this same solution however I was unable to get the accepted solution to work. Maybe I have done something wrong but regardless I cannot get it to work, and its logic seems really... odd. There has to be an easier way.

On a sidenote why does the normal search here at SO work so much worse then the search when adding a new question? Try searching for "MSSQL Weighted Full Text Search" in the normal search then ask a new question with that as the title.

+1 A:

The key to manipulating the rankings is to use a union. For each column you use a separate select statement. In that statement, add an identifier that shows from which column each row was pulled then. Insert the results into a table variable, then you can manipulate the ranking by sorting on the identifier or multiplying the rank by some value based on the identifier.

The key is to give the appearance of modifying the ranking, not to actually change sql server's ranking.

Example using a table variable:

DECLARE @Results TABLE (PersonId Int, Rank Int, Source Int)

For table People with Columns PersonId Int PK Identity, FirstName VarChar(100), MiddleName VarChar(100), LastName VarChar(100), AlsoKnown VarChar(100) with each column added to a full text catalog, you could use the query:

INSERT INTO @Results (PersonId, Rank, Source)

SELECT PersonId, Rank, 1 FROM ContainsTable(People, FirstName, @SearchValue) CT INNER JOIN People P ON CT.Key = P.PersonId

UNION SELECT PersonId, Rank, 2 FROM ContainsTable(People, MiddleName, @SearchValue) CT INNER JOIN People P ON CT.Key = P.PersonId

UNION SELECT PersonId, Rank, 3 FROM ContainsTable(People, LastName, @SearchValue) CT INNER JOIN People P ON CT.Key = P.PersonId

UNION SELECT PersonId, Rank, 4 FROM ContainsTable(People, AlsoKnown, @SearchValue) CT INNER JOIN People P ON CT.Key = P.PersonId

/* Now that the results from above are in the @Results table, you can manipulate the rankings in one of several ways, the simplest is to pull the results ordered first by Source then by Rank. Of course you would probably join to the People table to pull the name fields. */

SELECT PersonId FROM @Results ORDER BY Source, Rank DESC

/* A more complex manipulation would use a statement to multiply the ranking by a value above 1 (to increase rank) or less than 1 (to lower rank), then return results based on the new rank. This provides more fine tuning, since I could make first name 10% higher and middle name 15% lower and leave last name and also known the original value. */

SELECT PersonId, CASE Source WHEN 1 THEN Rank * 1.1 WHEN 2 THEN Rank * .9 ELSE Rank END AS NewRank FROM @Results ORDER BY NewRank DESC

The one downside is you'll notice I didn't use UNION ALL, so if a word appears in more than one column, the rank won't reflect that. If that's an issue you could use UNION ALL and then remove duplicate person id's by adding all or part of the duplicate record's rank to the rank of another record with the same person id.

Dave_H 2008-11-26 21:15:30

Ranks are useless across indexes, you can't merge them and expect the result to mean anything. The rank numbers of each index are apple/orange/grape/watermelon/pair comparisions that have no realitive meaning WRT contents of other indexes.

Sure you can try and link/weight/order ranks between indexes to try and fudge a meaningful result but at the end of the day that result is still jyberish however possibly still good enough to provide a workable solution depending on the specifics of your situation.

In my view the best solution is to put all data you intend to be searchable in a single FTS index/column and use that columns rank to order your output.. Even if you have to duplicate field contents to accomplish the result.

Einstein 2008-11-30 05:32:36

I assume the data returned is joined to other tables within your schema? I would develop your own RANK based on columns from associated data to the full text index. This also provides a guaranteed level of accuracy in the RANK value.

Coolcoder 2008-12-04 09:52:09

ansaurus

tags:

views:

answers:

SQL Server Weighted Full Text Search

related questions