tags:

views:

70

answers:

1
+1  Q: 

Lucene Indexing

Hi,

I have just started learning Lucene and would like to use it for indexing a table in an existing database. The way I have been thinking about this so far has been to 1. Create a 'Field' for every column in the table 2. Store all the Fields 3. 'ANALYZE' all the Fields except for the Field with the primary key 3. Each row in the table will be stored as a Lucene Document.

While most of the columns in this table are small in size, one happens to be huge. This column also happens to be the one containing bulk of the data on which searches will be performed.

I am aware that Lucene does provide an option to not store a Field. However, what would be the best way to go about this

  1. Store the field regardless of the size and if a hit is found for a search, fetch the appropriate Field from Document OR
  2. Don't store the Field and if a hit is found for a search, query the data base to get the relevant information out?

While writing my question I also realize there may not be a one size fits all answer..

Thanks in advance for your response.

+1  A: 

For sure, your system will be more responsive if you store everything on Lucene. Stored field does not affect the query time, it will only make the size of your index bigger. And probably not that bigger if it is only a small portion of the rows that have a lot of data. So if the index size is not an issue for your system, I would go with that.

Pascal Dimassimo
+1 for Pascal's response. You could also tokenize the large field and *not store* it. This way you can query(search) on the field, get the relevant document/record identifier and retrieve the record from db.
Mikos
Thanks for your responses. If I opt not to store any Field, I also would not be able to use Highlighting (Lucene contrib module) to highlight search hits?
cer_albastru
It could be done without storing the text, but it is not the easy way. See http://www.lucidimagination.com/search/document/5ea8054ed8348e6f/highlight_arbitrary_text#60f592f5ff0de0c5
Pascal Dimassimo
Oups, in my previous comment, I was referring to Solr. With plain Lucene, yes, I think you need to have the field stored.
Pascal Dimassimo