views:

96

answers:

2

I am indexing a row of data from database in Lucene.Net. A row is equivalent of Document.

I want to update my database with the DocId, so that I can use the DocId in the results to be able to retrieve rows quickly.

I currently first retrive the PK from the result docs which I think should be slower than retriving directly from the database using DocId.

How can I find the DocId when adding a document to Lucene?

+2  A: 

Relying on Lucene's DocId is a bad policy, as even Lucene tries to avoid this. I suggest you create your own DocId. In a database I would use an auto-increment field. If your application does not use a relational database, you can create this type of field programmatically. Other than that, I suggest you read Search Engine versus DBMS - I believe that only fields that may be searched should be stored in Lucene; The rest of the row belongs in a database, so the sequence of events is:

  1. Using Lucene, search for some text and get a DocId.
  2. Use the DocId to retrieve the full row from the database.
Yuval F
+1  A: 

As Yuval stated, leaking internal Lucene implementation details is bad, especially since Lucene doc id's change when the index is mutated.

If looking up the primary key using doc.get("pk") is too slow for you, use a FieldCache to cache all the pk's in memory. Then the lookups will be plenty fast.

bajafresh4life
Any sample code snippet to use FieldCache?
Rohit
I agree that relying on doc id is almost always poor design. However, I have a particular use case in which I have a read-only index and need to do some processing outside of what's possible with a search query so I need to store the doc id of certain documents for later reference. Can you please elaborate on using FieldCache to do so?
Lyle