I'm doing a typical full text search using containstable using 'ISABOUT(term1,term2,term3)' and although it supports term weighting that's not what I need. I need the ability to boost the relevancy of terms contained in certain portions of text. For example, it is customary for metatags or page title to be weighted differently than body text when searching web pages. Although I'm not dealing with web pages I do seek the same functionality. In Lucene it's called Document Field Level Boosting. How would one natively do this in Sql Server Full Text Search?
views:
253answers:
2
A:
This is just a thought -- is it possible to isolate the part you need boosting and then add the two together? I haven't had time to put it together properly, but let's say you have a 'document' column and a computed 'header' column, you could do something like this;
with compoundResults([KEY], [RANK]) as
(
select
a.[key],
a.[rank] *0.7 + b.[rank] * 0.3
from FREETEXTTABLE(dbo.Docs, document, @term) a
inner join FREETEXTTABLE(dbo.Docs, header, @term) b
on a.[Key] = b.[Key]
)
select * from dbo.Docs c
LEFT OUTER JOIN compoundResults d
ON c.TermId = d.[KEY]
So this example uses freetexttable and not containstable, but the thing to note is that there is a CTE which selects a weighted rank, taking seven tenths from the document body and three tenths from the header.
Steve Cooper
2009-12-09 01:37:22
YES, that's pretty cool, I wrote something very similar to this to see if it would be a workaround even though it might not be as fast. But I still wasn't getting sensible results because rank number can't be compared to anything outside a given FTS query. e.g. a.Rank might range from 100-20 and b.Rank might happen to rank 800-150, so .7 of 100 is still less significant than .3 of 800, they are just apples and oranges. If the ranks were always normalized between 1 and 0 I could just take the geometric mean. I could try to normalize them myself but it just seems like a lot of "duct tape."
Snives
2009-12-09 09:10:00
Dammeeet! I thought that was such a good idea, too ;) I'm all out of ideas, then.
Steve Cooper
2009-12-09 09:40:26
A:
The native functionality you're looking for doesn't exist in SQL Server FTS.
What does your data look like? Would it work to do extend the keyword patterns in some way, so that they match the corresponding parts of the document? Something like:
ISABOUT("title ~ keyword ~ title" weight 0.8, "keyword" 0.2)
RickNZ
2009-12-16 07:43:21