views:

196

answers:

1

I'm using MSSQL's full text indexing on a handful of tables in my CMS and am unfortunately stuck with SQL Server 2000. I'm querying the index using freetexttable joins and am having pretty good results, but we have some unique terms that are likely search queries that do not appear to be stemming. For example a query using the term "smartbar" returns a couple of rows, but one using "smartbars" returns a different, only partially overlapping, result set.

If I understand the stemming concept correctly I would assume that the first query would contain all of the rows in the second query, but it does not.

Looking at MSDN's Books Online, this page mentions stemming in the freetext_string parameter comments, but this page does not. From that I would assume that SQL Server 2000 doesn't stem the words in the freetexttable query; is this a correct assumption? Is there any way to encourage or enable SQL Server 2000 to stem? As a fall-back I'll probably use the thesaurus files for important queries for our unique terms, but would rather let SQL Server do most of the work.

As a follow up, if anyone has links to good resources regarding SQL Server 2000 full-text querying I would greatly appreciate them. The MSDN information is useful, but I would love some more in-depth info and haven't been able to find much.

+1  A: 

I guess what you're looking for is a little bit different. You want to search for all forms of the word. It has some relation to stemming but in MS SQL Server Full-Text Search you get the result a little bit different. You need to explicitly instruct server to search for all forms of the word and not only for the word itself.

Here's how to do this:

SELECT <select list> 
FROM [YourTable] 
WHERE CONTAINS(*, 'FORMSOF (INFLECTIONAL, smartbar)')

There's an article in MSDN on this subject. According to other resources this feature was available in SQL 2k as well (but I don't have an instance at hand to verify).

AlexS
excellent, thanks for that. I'll check it out and see if it helps find the additional forms.
cori
I ended up not being able to test this on our data - the users had cleaned up the data before I could try it, but based on this [http://msdn.microsoft.com/en-us/library/ms142509(SQL.90).aspx] I believe this woudn't have fixed it, because "smartbar" isn't a recognized English word, and stemming (and I think by extension word forms) works against a known dictionary. I think we'll be stuck with using the thesaurus, which isn't really all that unreasonable, since we're looking at proprietary terms.
cori
If you want that level of flexibility you should probably take a look at Apache Solr (a search server based on Apache Lucene full-text index).
AlexS