Hello,
I'm currently working on implementing a fulltext search engine for one of our sites, using the features that the CMS itself offers. It allows me to execute an SQL query without doing middle-tier programming.
However, this also means that I can't use programming to filter or clean the search query data up. Works fine, until a user enters 'noise words'.
My current query takes the user entered value and performs some operations on it before sending it to the CONTAINS function, such as an ltrim / rtrim. It also replaces a space with ' AND ', so each word is used as a separate clause.
This is where the problem begins though. Apparently, SQL Server complains when you have a noise word (and only that) in one of the clauses. So, when a user enters 'Wat niemand had verwacht' as a query (Dutch), SQL Server complains that a clause contains only ignored words. It works with the first half ('wat niemand'), but quits as soon as the 'had' is entered. After editing, the CONTAINS function will receive 'Wat AND niemand AND had AND verwacht', and will choke.
So, concretely: Does anyone know how I can get SQL Server to filter out noise words? I can, of course, include a massive list of replaces, in which each noise word is included, but I doubt this is efficient. I can also filter out <= 3 letter words, but I don't think that'll help much for longer noise words (as supposedly, 'amfitheatersgewijze' is also a noise word). Clearing the noise word configuration file and rebuilding the search index is a measure I'd only consider last.