views:

1778

answers:

4

I'm having some problems with the ranking used by fulltext search in SQL Server.

Suppose a user searches for two words, "foo bar". We assume the user intends to do an OR search, so we pass "foo OR bar" to our CONTAINSTABLE call. What we're finding is that an row that contains "foo" 10 times but does not contain "bar" will have a much higher rank then an row that that has both "foo" and "bar". We would want the row that has both terms to be preferred over a row that only has one term multiple times. Any advice on how to achieve this?

I have found documentation related to a RANKMETHOD modifier, but that seems to be for SQL Enterprise search only, and not available on regular SQL Server 2005 installs. We could also switch to Lucene.Net, but I would hope to verify it has the better ranking available. We might also do multiple searches and merge the results, but that seems undesireable as the number of words searched for increases.

+1  A: 

I did a short discussion on Inverted Indexes and Cosine Ranking (the core algorithms and data structures of full-text retrieval systems) Here.

On a pure cosine ranking, foo (1), bar (1) should be closer to the vector occupied by foo, bar than foo (10), bar (0). One can fiddle the cosine value post-hoc, which is essentially what Pagerank does. If the full-text search is fiddling the cosine ranking after calculating the dot product rather than weighting the axes, this would explain the disproportionate significance of foo.

ConcernedOfTunbridgeWells
+1  A: 

Just to followup on this.. The reason the ranking was low for rows containing strings A and B when matching against string "A or B" is that A was in a different column than B. If I create a view on the table that merges those columns into a single column, the ranking for "A or B" is more what I expected.

Frank Schwieterman
+1  A: 

RANKMETHOD is not available to any edition of SQL Server. If you are building the keywords to pass into the CONTAINSTABLE search and you know something about which words were more "important" you can rank the words with ISABOUT and WEIGHT.

For instance if foo has a higher rank than bar, you could do something like this:

SELECT * FROM CONTAINSTABLE (YourFullTextTable, *, 'ISABOUT("foo" WEIGHT (0.5), "bar" WEIGHT(0.3))')

Coolcoder
+1  A: 

Try rewritting the query SELECT * FROM CONTAINSTABLE (YourFullTextTable, *, '"foo bar" or "foo" ~ "bar" or "foo or bar"') . This query in essence must rank any records with exact "foo bar" > "foo" near "bar" > foo or bar

TestCheck