MS Sql Full-text search vs. LIKE expression

views:

377

answers:

+2 Q:

MS Sql Full-text search vs. LIKE expression

Hi.
I'm currently looking for a way to search a big database (500MB - 10GB or more on 10 tables) with a lot of different fields(nvarchars and bigints). Many of the fields, that should be searched are not in the same table.

An example: A search for '5124 Peter' should return all items, that ...

have an ID with 5124 in it,
have 'Peter' in the title or description
have item type id with 5124 in it
created by a user named 'peter' or a user whose id has 5124 in it
created by a user with '5124' or 'peter' in his street address.

How should i do the search? I read that the full-text search of MS-Sql is a lot more performant than a query with the LIKE keyword and i think the syntax is more clear, but i think it cant search on bigint(id) values and i read it has performance problems with indexing and therefore slows down inserts to the DB. In my project there will be more inserting than reading, so this could be a matter.

Thanks in advance, Marks

+2 A:

You could try a standalone search engine, such as Sphinx Search:

http://www.sphinxsearch.com/index.html

or Apache Solr:

http://lucene.apache.org/solr/

Mike 2010-06-14 12:09:22

I have to look at these when i have more time, but i would like to know a solution if staying with only MS SQL Server.

Marks 2010-06-14 12:14:42

You could use full-text search to search non integer (ID) columns. Then extract any numbers from the search query, and use standard SQL select queries to test the integer (ID) columns. There are a few techniques to help mitigate indexing performance issues. See http://stackoverflow.com/questions/933351/ms-sql-server-2000-slow-full-text-indexing.

Mike 2010-06-14 12:34:55

I don't think you're going to get the performance you need out of MS SQL; you're going to need to construct very complex queries to cover all the data/tables that you're going to be searching, and you have the added encumbrance of writing data to the database at the same time as you are querying it.

I would suggest you look at either Apache Solr (http://lucene.apache.org/solr/) or Lucene (http://lucene.apache.org). Solr is built on top of Lucene, both can be used to create an inverted file index, basically like the index in the back of book (term 1 appears in documents 1, 3, 7, etc.) Solr is a search-engine-in-a-box, and has several mechanisms that will let you tell it how and where to index data. Lucene is more lower-level, and will let you set up your indexing and searching architecture with more flexibility.

The good thing about Solr is that it's available as a web service, so if you're not familiar with Java, you can find a Solr client in the language of your choice, and write indexing and searching code in whatever language suits you. Here's a link to a list of client libraries for Solr, including some in C# http://wiki.apache.org/solr/IntegratingSolr That's where I'd start.

greggersh 2010-06-14 12:35:54

ansaurus

tags:

views:

answers:

MS Sql Full-text search vs. LIKE expression

related questions