Lucene Indexing

Hi,

I have just started learning Lucene and would like to use it for indexing a table in an existing database. The way I have been thinking about this so far has been to 1. Create a 'Field' for every column in the table 2. Store all the Fields 3. 'ANALYZE' all the Fields except for the Field with the primary key 3. Each row in the table will be stored as a Lucene Document.

While most of the columns in this table are small in size, one happens to be huge. This column also happens to be the one containing bulk of the data on which searches will be performed.

I am aware that Lucene does provide an option to not store a Field. However, what would be the best way to go about this

Store the field regardless of the size and if a hit is found for a search, fetch the appropriate Field from Document OR
Don't store the Field and if a hit is found for a search, query the data base to get the relevant information out?

While writing my question I also realize there may not be a one size fits all answer..

Thanks in advance for your response.

+1 for Pascal's response. You could also tokenize the large field and *not store* it. This way you can query(search) on the field, get the relevant document/record identifier and retrieve the record from db.

Mikos 2010-07-14 00:20:11

Thanks for your responses. If I opt not to store any Field, I also would not be able to use Highlighting (Lucene contrib module) to highlight search hits?

cer_albastru 2010-07-14 16:27:07

It could be done without storing the text, but it is not the easy way. See http://www.lucidimagination.com/search/document/5ea8054ed8348e6f/highlight_arbitrary_text#60f592f5ff0de0c5

Pascal Dimassimo 2010-07-14 16:37:02

Oups, in my previous comment, I was referring to Solr. With plain Lucene, yes, I think you need to have the field stored.

Pascal Dimassimo 2010-07-14 16:43:49

ansaurus

tags:

views:

answers:

related questions