views:

241

answers:

2

I have an application that uses lucene for searching. The search space are in the thousands. Searching against these thousands, I get only a few results, around 20 (which is ok and expected).

However, when I reduce my search space to just those 20 entries (i.e. I indexed only those 20 entries and disregard everything else...so that development would be easier), I get the same 20 results but in different order (and scoring).

I tried disabling the norm factors via Field#setOmitNorms(true), but I still get different results?

What could be causing the difference in the scoring?

Thanks

+2  A: 

Scoring depends on all the documents in the index:

In general, the idea behind the Vector Space Model (VSM) is the more times a query term appears in a document relative to the number of times the term appears in all the documents in the collection, the more relevant that document is to the query.

Source: Apache Lucene - Scoring

sfussenegger
I'm not sure I understand that. So if I searched for a person given the name 'Mark' against all the person in my search space, I'll be getting 'Mark Anthony', 'Markos', and 'Mark'. But if I limit the search space to those 3 only (just limit the indexing to just those 3), I will get 'Mark', 'Mark Anthony', and 'Markos'. How come their sorting will change given the same relevant documents but different 'noise' documents?
Franz See
Sorry, I'm not an expert either. Did you have a look at http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html ?
sfussenegger
+4  A: 

Please see the scoring documentation in Lucene's Similarity API. My bet is on the difference in idf between the two cases (both numDocs and docFreq are different). In order to know for sure, use the explain() function to debug the scores.

Edit: A code fragment for getting explanations:

TopDocs hits = searcher.search(query, searchFilter, max);
ScoreDoc[] scoreDocs = hits.scoreDocs;
for (ScoreDoc scoreDoc : scoreDocs) {
  String explanation = searcher.explain(query, scoreDoc.doc).toString();
  Log.debug(explanation);
}
Yuval F
Pardon, but where can I get the int (second parameter) of explain() ?
Franz See
Please see my edit for an example.
Yuval F
I didn't had much time to go back at my problem but this suggestion seems to point to the right direction. Thanks.
Franz See