views:

489

answers:

2
A: 

Unstored fields are just that - not stored. Their contents cannot be retrieved from the index.

In order to do what you have said, you have a few options:

  • make each field stored so that you can make a new document from an existing one
  • if your unstored field is large (ie. the contents of a text file), store a pointer to the original contents in the index (ie. its file path). When creating a new document, read this pointer from the existing document, fetch the field contents from the original source (ie. from the text file), and then add it unstored to your new document
  • if you are not altering the unstored field, you can retrieve the existing document, update its other fields and then put it back into the index. This might only be possible in later versions of Lucene though (v2.2 upwards). EDIT: having tried this option, it does not work - see my comment below.

Ultimately, if you need to get the value of the unstored field, you will have to make it stored.

adrianbanks
I'm curious about last options you have mentioned, indeed I'm not going to update the content, the only thing I need to do some analysis based on other field update them and that's all. So I'm not sure I understood how do I need to do it. From what you saying it's enough to find document in the index and update only the stored fields I'm interested in, but what do I need to do next? Just add the updated document as is into the index? So how will it connect with previously unstored fields?
Artem Barger
Apologies. Having just tried the third option, it appears that unstored fields are removed from the document when you retrieve it from the `IndexReader`. Doing as I said will re-add the document with the unstored field missing, so this approach is not really a valid option. It looks like you will have to store the value of the field to do what you want to do. [Updated answer to reflect this]
adrianbanks
+1  A: 

You can use Luke for an easy way to view the index. EDIT: I think I understand the problem now. Here is Andrzej Bialecki's proposed solution, which says: Create an index containing documents with just the new/modified fields. Each document in the original index will have a conjugate document with the calculated fields. Use a ParallelReader to search pairs of documents having the original and calculated fields.

Yuval F
Well, in my case I just cannot do it, since I want to update the index after I've finished the indexation of data set. Actually I need to do some link analysis, so I need to index once and after analysis update the documents within it.
Artem Barger
+1 for Luke. it's an eye opener
cherouvim
@Artem: Please explain why you cannot do this. Why not do the following: 1. Index your data set, putting the result in index A. 2. Go over A, doing your link analysis, and storing the results in index B. B will contain either fields you copied verbatim from A, or analysis results, which I consider to be new fields. For every document in A, you will have a mirror document in B. 3. Close indexes A and B. 4. Copy A to a backup. 5. Use index B for all your retrieval needs. If I am missing something, please tell me.
Yuval F
@Yuval F: That won't work because any fields that are unstored in index A cannot be retrieved to be copied into index B, so they will end up missing from index B. Luke (v0.9.2) does have the ability to reconstruct an unstored field, thereby getting its value, but this is a brute force approach using the index statistics and may actually get a different value from the original indexed value. The only way to copy a field from one index to another is to make it a stored field in the original index.
adrianbanks
@adrianbanks: Thanks. See my edit that suggest another option.
Yuval F