ansaurus

Question

full-text search sql server 2005

Answer 1

A:

Try using CONTAINS maybe? Freetext functions more like "LIKE".

Matt 2010-09-02 13:37:11

i tried contains and it produced the same result

csetzkorn 2010-09-02 13:38:17

Answer 2

A:

FREETEXT is less precise compared to CONTAINS

http://www.mssqltips.com/tip.asp?tip=1382

CONTAINS explained:

http://msdn.microsoft.com/en-us/library/ms187787.aspx

DmitryK 2010-09-02 13:45:00

i tried contains and it produced the same result. edited my original question

csetzkorn 2010-09-02 13:46:33

Answer 3

A:

I don't have an FTS-enabled DB set up to test on, but have you tried something like contains(data,'world AND NOT "-world"')? You may have to look into modifying the word-breaking.

joelt 2010-09-03 13:03:16

I guess this would work but I could not do this for any possibility if you know what I mean. At the end of the day I would like to index some text against a few tousand EXCACT words. So far full-text search seems pretty useless even for such simple tasks ...

csetzkorn 2010-09-03 13:11:40

I don't know what I was thinking. Adding the NOT will probably have no effect, and you almost certainly need to look into modifying the word breaking...SQL Server treats "j-world" as two different words "j" and "world". So you're getting an exact match on "world". MS link: http://support.microsoft.com/kb/200043 and someone else trying to do roughly the same thing: http://stackoverflow.com/questions/1542708/how-to-change-word-break-characters-in-sql-server-full-text-indexing

joelt 2010-09-03 13:29:36

Answer 4

+1 A:

I do not quite understand why do you want FTS? If you want exact march, this is done by simply using LIKE:

SELECT * FROM test
WHERE
data LIKE '% world%'
- --results in
  --Hello world!
SELECT * FROM test
WHERE
data LIKE '%j-world%'
- --results in
  --Hello j-world!

If you want to play with FTS. Create and engage your own (custom) Full-text Stoplist

I do not have SQL Server 2005 but I checked that it works in 2008.
Docs tell that it is possible for compatibility level 100 only (i.e. in SQL Server 2008).
Though, try it in 2005

In SSMS Databases\YourDatabaseName\Storage\Full Text Stoplist --> right-click and choose "New Full-text StopList...". I named it vgvStoplist and made sure that default "Create an empty syoplist" radiobutton was checked.

In SSMS right-click table dbo.test ---> Full-text index --> Properties ---> Select a page: General, Full-text Index Stoplist --> enter name of created empty list (I entered vgvStoplist)

Now, the query

select * from test where contains (data, '"j-world"')

returns only 'Hello j-world' (without 'Hello world')

This also can be done through TSQL. Follow msdn

==== Update:
Well, your question showed that the notion of noise is subjective.

It worked because 'j' is system stopword (cf. it searching the system stoplist ( * ) by 'j' (3 symbols) string, see also ( ** )) and '-' is, apparently, wordbreaker.

I did not propose you to use empty stopword list. I just illustrated "how to" with a minimum of efforts from my side.
Elaboration of techniques suited for you is up to you. I am not even expert in this domain in order to give advises. I just answered you from the principle of common sense

Create your own Full Text StopList, fill it with your content.
You might want to reuse system stoplist content.
For this, you may want to create

(*) separate script of system stoplist
by creating one more Full Text StopList marking it with "Create from the system stoplist" then script it (to "File..." or to "New Query Editor Window"),

then create your own script by by editing a copy of () using find-and-replace and/or copy&paste from ().

(**) Here is an excerpt from scripted copy, named by me as vgv_sys_copy, of system FT StopList :

ALTER FULLTEXT STOPLIST [vgv_sys_copy] ADD 'j' LANGUAGE 'French';
ALTER FULLTEXT STOPLIST [vgv_sys_copy] ADD 'j' LANGUAGE 'Italian';
ALTER FULLTEXT STOPLIST [vgv_sys_copy] ADD 'j' LANGUAGE 'Japanese';
ALTER FULLTEXT STOPLIST [vgv_sys_copy] ADD 'j' LANGUAGE 'Dutch';
ALTER FULLTEXT STOPLIST [vgv_sys_copy] ADD 'j' LANGUAGE 'Russian';
ALTER FULLTEXT STOPLIST [vgv_sys_copy] ADD 'j' LANGUAGE 'Swedish';
ALTER FULLTEXT STOPLIST [vgv_sys_copy] ADD 'j' LANGUAGE 'Simplified Chinese';
ALTER FULLTEXT STOPLIST [vgv_sys_copy] ADD 'j' LANGUAGE 'British English';

Update2
I posted subquestion Performace gains of searching with FTS over it with LIKE on indexed colum(s)?

I also noticed that I answered basing on features not available in SQL Server 2005
There should be MSSQL\FTData\noiseENG.txt and I liked answers to Noise Words in Sql Server 2005 Full Text Search

I would have removed 'j'. As a matter of fact, if I were you, I would have created noiseENG.txt from scratch. But it is your decisions depending on your context and multiple unknown to me factors.

I believe you should post it as separate question. I already was banned multiple times in StackExchange sites (and still am in SF) for discussions. This is not forum or discussion board, cf. FAQ.

vgv8 2010-09-29 17:28:17

So did you downvote all other answers because of this? In reality your answer should have been posted here in the "original" thread and the other post should be flagged as a duplicate.

Matt 2010-09-29 17:57:25

I've closed the duplicate, so please edit this with the text of your real answer.

Bill the Lizard 2010-09-29 18:26:56

Thanks that sounds interesting. I'll have a look into this. I did not consider the 'LIKE solution' because of efficiency. I have to deal with millions of texts and millions of keywords. Thanks.

csetzkorn 2010-09-30 07:31:34

To be honest I do not understand why this should work (the stop list solution). Firstly you created an empty stoplist which is bad for efficieny as stop/noise words are now included during indexing - so I would not consider this (IMHO). Secondly is the j-world issue not related to word breaking - so why does an empty stoplist solve this? Please explain - thanks!

csetzkorn 2010-09-30 07:58:17

See my Update in my answer

vgv8 2010-09-30 12:03:56

Ok thanks this kind of makes sense. Not sure about the 'common sense' remark (a) j is no word so common sense tells me not to look for it in the stop/noise word list but I just looked and it is in there (b) the word breaker is also important - however common sense tells me not to break hyphened words (j-world is just an example). So what's the solution - remove j from the stopword list? I still think the word breaker would mess things up. Please not that I want to use FTS as LIKE is highly inefficient.

csetzkorn 2010-09-30 12:50:27

See my Update2 in my answer

vgv8 2010-09-30 16:21:15

ansaurus

tags:

views:

answers:

full-text search sql server 2005

related questions