tags:

views:

167

answers:

2

I am trying to find out how lucene search works so fast. Cant find any useful docs on the web. If you have anything (short of lucene source code) to read, let me know.

A text search query using mysql5 text search with index takes about 18 minutes in my case. A lucene search for the same query takes less than a second

A: 

In a word: indexing.

Lucene creates an index of your document that allows it to search much more quickly.

It's the same difference between a list O(N) data structure and a hash table O(1) data structure. The list has to walk through the entire collection to find what you want. The hash table has an index that lets it figure out exactly where the desired item is and simply fetch it.

duffymo
Yes I understand the indexing part, but again, lucene index searches are a lot faster than mysql index searches. How does that happen
Midhat
+6  A: 

Lucene is an inverted full-text index. This means that it takes all the documents, splits them into words, and then builds an index for each word. Since the index is an exact string-match, unordered, it can be extremely fast. Hypothetically, an SQL unordered index on a varchar field could be just as fast, and in fact I think you'll find the big databases can do a simple string-equality query very quickly in that case.

Lucene does not have to optimize for transaction processing. When you add a document, it need not ensure that queries see it instantly. And it need not optimize for updates to existing documents.

However, at the end of the day, if you really want to know, you need to read the source. Both things you reference are open source, after all.

bmargulies