You may want to consider looking through all the documents that match categories using a TermDocs iterator.
This example code goes through each "Category" term, and then counts the number of documents that match that term.
public static void countDocumentsInCategories(IndexReader reader) throws IOException {
TermEnum terms = null;
TermDocs td = null;
try {
terms = reader.terms(new Term("Category", ""));
td = reader.termDocs();
do {
Term currentTerm = terms.term();
if (!currentTerm.field().equals("Category")) {
break;
}
int numDocs = 0;
td.seek(terms);
while (td.next()) {
numDocs++;
}
System.out.println(currentTerm.field() + " : " + currentTerm.text() + " --> " + numDocs);
} while (terms.next());
} finally {
if (td != null) td.close();
if (terms != null) terms.close();
}
}
This code should run reasonably fast even for large indexes.
Here is some code that tests that method:
public static void main(String[] args) throws Exception {
RAMDirectory store = new RAMDirectory();
IndexWriter w = new IndexWriter(store, new StandardAnalyzer());
addDocument(w, 1, "Apple", "fruit", "computer");
addDocument(w, 2, "Orange", "fruit", "colour");
addDocument(w, 3, "Dell", "computer");
addDocument(w, 4, "Cumquat", "fruit");
w.close();
IndexReader r = IndexReader.open(store);
countDocumentsInCategories(r);
r.close();
}
private static void addDocument(IndexWriter w, int id, String name, String... categories) throws IOException {
Document d = new Document();
d.add(new Field("ID", String.valueOf(id), Field.Store.YES, Field.Index.UN_TOKENIZED));
d.add(new Field("Name", name, Field.Store.NO, Field.Index.UN_TOKENIZED));
for (String category : categories) {
d.add(new Field("Category", category, Field.Store.NO, Field.Index.UN_TOKENIZED));
}
w.addDocument(d);
}