tags:

views:

325

answers:

4

I want to search a sentence (word combination of) in some table or view of DB. I dont want to use Fultext search property of DB. Is there any alternative efficient way?

+2  A: 

None as efficient as Fulltext search.

Basically it boils down to where with like derivatives and since indexes are tossed away in most of the scenarios , it becomes a very expensive query.

Learning
+1  A: 

If you are using JAVA have at look at Lucene

If you are using .net, you can have a look at Lucene.net, it will minimize the calls to the database for the search queries.

Following from http://incubator.apache.org/lucene.net/

Lucene.Net is a source code, class-per-class, API-per-API and algorithmatic port of the Java Lucene search engine to the C# and .NET platform utilizing Microsoft .NET Framework.

Lucene.Net sticks to the APIs and classes used in the original Java implementation of Lucene. The API names as well as class names are preserved with the intention of giving Lucene.Net the look and feel of the C# language and the .NET Framework. For example, the method Hits.length() in the Java implementation now reads Hits.Length() in the C# port.

In addition to the APIs and classes port to C#, the algorithm of Java Lucene is ported to C# Lucene. This means an index created with Java Lucene is back-and-forth compatible with the C# Lucene; both at reading, writing and updating. In fact a Lucene index can be concurrently searched and updated using Java Lucene and C# Lucene processes.

renegadeMind
You dont have to be limited to the JVM to take advantage of Lucene - you could use Solr which is an HTTP wrapper around Lucene. I have had tremendous success with Solr: http://lucene.apache.org/solr/
Cody Caughlan
That's great, didn't know 'bout that, thanks cody!
renegadeMind
+2  A: 

Without the use of an index, a database has to perform a "full table scan". This is rather like you looking through a book one page at a time to find what you need.

That being said, computers are a lot faster than humans. It really depends on how much load your system has. Using MySQL we successfully implemented a search system on a table of lead information. The nature of the problem was one that could not be solved by normal indexes (including full text). So we designed it to be powered using a full table scan.

That involved creating tables as narrow as possible with the search data, and joining them to a larger table with related, but non-search data.

At the time (4 years ago), 100,000 records could be scanned in .06 seconds. 1,000,000 records took about .6 seconds. The system is still in heavy production use with millions of records.

If your data needs exceed 6 digits of records, you may want to re-evaluate using a full text index, or do some research on inverted indexes.

Please comment if you would like any more info.


Edit: The search tables were kept as narrow as possible. Ideally 50-100 bytes per record. ENUMS and TINYINT are great space savers if you can use them to "map" to string values another way.

The search queries were generated using a PHP class. They were simply:

-- DataTable is the big table that holds all of the data
-- SearchTable is the narrow table that holds the bits of searchable data

SELECT 
  MainTable.ID, 
  MainTable.Name, 
  MainTable.Whatever 
FROM 
  MainTable, SearchTable 
WHERE 
  MainTable.ID = SearchTable.ID 
  AND SearchTable.State IN ('PA', 'DE')
  AND SearchTable.Age < 40
  AND SearchTable.Status = 3

Essentially, the two tables were joined on a primary key (fast), and the filtering was done by full table scan on the SearchTable (pretty fast). We were using MySQL.

We found that by having the record format == "FIXED" in the MyISAM tables, we could increase performace by 3x. This meant no blobs, no varchars, etc...

Let me know if this helps.

gahooa
Yes please explain a bit more for me. Wasn't clear on Narrow tables or how you actually implemented the search.
Robert
A: 

You could break up the text into individual words, stick them in a separate table, and use that to find PK IDs that have all the words in your search sentence [i.e. but not necessarily in the right order], and then search just those rows for the sentence. Should avoid having to do a table scan every time.

Please ask if you need me to explain further

Kristen