views:

561

answers:

3

When searching a bunch of documents, I can easily find the number of documents which match my search criteria:

Hits hits = Searcher.Search(query);
int DocumentCount = hits.Length();

How do I determine the total number of hits within the documents? For example, let's say I search for "congress" and I get 2 documents back. How can I get the number of times "congress" occurs in each document? For example let's say "congress" occurs 2 times in document #1 and 3 times in document #2. The result I'm looking for is 5.

+3  A: 

This is Lucene Java, but should work for Lucene.NET:

    List docIds = // doc ids for documents that matched the query, 
                  // sorted in ascending order 

    int totalFreq = 0;
    TermDocs termDocs = reader.termDocs();
    termDocs.seek(new Term("my_field", "congress"));
    for (int id : docIds) {
        termDocs.skipTo(id);
        totalFreq += termDocs.freq();
    }
bajafresh4life
@bajafresh4life: What about if the phrase was two words like "apple tree"?
Keltex
Do you want the # of times the phrase appears in each doc or each individual word?
bajafresh4life
A: 

This is Lucene Java also. If your query/search criteria can be written as a SpanQuery, then you can do something like this:

IndexReader indexReader = // define your index reader here
SpanQuery spanQuery = // define your span query here
Spans spans = spanQuery.getSpans(indexReader);
int occurrenceCount = 0;
while (spans.next()) {
    occurrenceCount++;
}
// now occurrenceCount contains the total number of occurrences of the word/phrase/etc across all documents in the index
Kai Chan
A: 

Hey there,

I got no of occurance of one word in documents using -

List docIds = // doc ids for documents that matched the query,
// sorted in ascending order

int totalFreq = 0; 
TermDocs termDocs = reader.termDocs(); 
termDocs.seek(new Term("my_field", "congress")); 
for (int id : docIds) { 
    termDocs.skipTo(id); 
    totalFreq += termDocs.freq(); 
} 

that works greate! I have search text "female" if gets no of occurences of "female" in each document - thats fine.

but when my search text is "female patients" it dosen't find occurances of "female" and "patients" in documents!

Can any one please help me ? If I have a long search String, how can I get no of occurances of such words(excluding stop words)in a document?

Many thanks, Archi