views:

25

answers:

1

So far, I'm not getting meaningful results from my full-text queries so I decided to give a simple example of what I am trying to do and the results I expect.

I've made the the following test table (tblCars) with full-text enabled for the column [Car] and primary key [CarID].

CarID Car
----- -----------------
9     BMW 330Ci 2009
14    AUDI A4 2010
16    AUDI A3 2.0T 2009

I want to run a ranked search for the terms 'audi OR bmw', I expect to get equal rankings on all search results.

SELECT tblCars.*, [RANK] AS Ranked FROM viewCarSearch 
    INNER JOIN 
    (SELECT [KEY] AS CarID, [RANK] AS Ranked FROM CONTAINSTABLE 
    (tblCars, Car, @SearchOr))
    tblSearch ON tblCars.CarID = tblSearch.CarID 

Instead I get this:

CarID Car                Ranked
----- ------------------ -------
9     BMW 330Ci 2009     48
14    AUDI A4 2010       32
16    AUDI A3 2.0T 2009  32


In fact, no matter what combination of ORs I do, the BMW is always ranked higher or equal to the AUDI even if it seems totally illogical. I've tried using some ANDs in my search term and still it gives strange results, with the BMW always showing more favorably than expected.

Can anyone point to where I'm going wrong... I'm thinking my expectations must be all wrong, but I can't imagine how I'm gonna get well ranked results for a large table.

+1  A: 

Obviously, Microsoft believes that BMW is a superior car to the Audi. :-)

OK, seriously, there are many factors that go into calculating the RANK returned, which is a unitless number between 1 and 1000. Fulltext primarily uses the Jaccard Index for calculating ranks. Other factors taken into consideration include document length (other factors being equal, shorter documents will rank higher than longer documents) and the number of occurrences of the search word/phrase in the document.

My best guess at explaining your results, and I stress that it's only an educated guess, is that:

  • CarIDs 14 and 16 have very similar text, the first 10 characters vary only in two characters (4 vs. 3 at position 7, 0 vs. . at position 10), so they will be ranked close together. In fact, they come out equal in your example.
  • CarID 9's text is shorter than CarID 16's, so it will merit a higher ranking.
Joe Stefanelli
... and CarID 14's text is shorter than CarID 9's text, hence by same logic I'd expect CarID 14 to trump it... maybe I just need more data to really see better results. Thanks for your feedback.
3advance