views:

377

answers:

2

Hi.
I'm currently looking for a way to search a big database (500MB - 10GB or more on 10 tables) with a lot of different fields(nvarchars and bigints). Many of the fields, that should be searched are not in the same table.

An example: A search for '5124 Peter' should return all items, that ...

  • have an ID with 5124 in it,
  • have 'Peter' in the title or description
  • have item type id with 5124 in it
  • created by a user named 'peter' or a user whose id has 5124 in it
  • created by a user with '5124' or 'peter' in his street address.

How should i do the search? I read that the full-text search of MS-Sql is a lot more performant than a query with the LIKE keyword and i think the syntax is more clear, but i think it cant search on bigint(id) values and i read it has performance problems with indexing and therefore slows down inserts to the DB. In my project there will be more inserting than reading, so this could be a matter.

Thanks in advance, Marks

+2  A: 

You could try a standalone search engine, such as Sphinx Search:

http://www.sphinxsearch.com/index.html

or Apache Solr:

http://lucene.apache.org/solr/

Mike
I have to look at these when i have more time, but i would like to know a solution if staying with only MS SQL Server.
Marks
You could use full-text search to search non integer (ID) columns. Then extract any numbers from the search query, and use standard SQL select queries to test the integer (ID) columns. There are a few techniques to help mitigate indexing performance issues. See http://stackoverflow.com/questions/933351/ms-sql-server-2000-slow-full-text-indexing.
Mike
A: 

I don't think you're going to get the performance you need out of MS SQL; you're going to need to construct very complex queries to cover all the data/tables that you're going to be searching, and you have the added encumbrance of writing data to the database at the same time as you are querying it.

I would suggest you look at either Apache Solr (http://lucene.apache.org/solr/) or Lucene (http://lucene.apache.org). Solr is built on top of Lucene, both can be used to create an inverted file index, basically like the index in the back of book (term 1 appears in documents 1, 3, 7, etc.) Solr is a search-engine-in-a-box, and has several mechanisms that will let you tell it how and where to index data. Lucene is more lower-level, and will let you set up your indexing and searching architecture with more flexibility.

The good thing about Solr is that it's available as a web service, so if you're not familiar with Java, you can find a Solr client in the language of your choice, and write indexing and searching code in whatever language suits you. Here's a link to a list of client libraries for Solr, including some in C# http://wiki.apache.org/solr/IntegratingSolr That's where I'd start.

greggersh