views:

29

answers:

4

I'm writing a search engine in C#, retrieving rows from a SQL database. I'd like the search to also include similar words - for example, if a user searches for "investing", the search will also return matches for "investment", or if the user searches for "financial", the search will also return matches for "finance".

How can I retrieve similar words such as these from a search keyword?

+2  A: 

What you're looking for is stemming. You may want to look at what's available in Lucene.net... although it's also possible that SQL Server supports this natively with full text indexing. Indeed, it looks like it, given this article.

Jon Skeet
+2  A: 

What you're trying to accomplish is known as "Stemming". Read the Wikipedia article for more info:

http://en.wikipedia.org/wiki/Stemming

BFree
+1  A: 

If you're using SQL Server you can take advantage of the FREETEXT search, which supports stemming:

Select * from SomeTable
where FREETEXT(*,'invest')

The above searches all columns for all forms of the word invest. It's equivalent to:

Select * from SomeTable
where CONTAINS(*,'"invest" or "invests" or "investor" 
                  or "investing'" or "invested" or "investor's" ... )

Here's an MSDN article with more examples and documentation.

LBushkin
A: 

Additionally, soundex searching can help find matches with similar phonetics. This is supported in SQL Server SOUNDEX() function. .NET doesn't appear the have it built-in, but CodeProject has several implementations.

spoulson