I have some documents stored in a Lucene index with a docId field. I want to get all docIds stored in the index. There is also a problem. Number of documents is about 300 000 so I would prefer to get this docIds in chunks of size 500. Is it possible to do so?
+1
A:
Document numbers (or ids) will be subsequent numbers from 0 to IndexReader.maxDoc()-1. These numbers are not persistent and are valid only for opened IndexReader. You could check if the document is deleted with IndexReader.isDeleted(int documentNumber) method
Yaroslav
2010-02-22 19:09:38
+2
A:
IndexReader reader = // create IndexReader
for (int i=0; i<reader.maxDoc(); i++) {
if (reader.isDeleted(i))
continue;
Document doc = reader.document(i);
String docId = doc.get("docId");
// do something with docId here...
}
bajafresh4life
2010-02-23 21:15:28
What does happen if (reader.isDeleted(i)) is missing?
Jenea
2010-02-24 16:16:36
Without the isDeleted() check, you would output id's for documents that had been previously deleted
bajafresh4life
2010-02-25 03:34:51