tags:

views:

1030

answers:

3

In Lucene if you had an multiple indexes that covered only one partition each. Why does the same search on different indexes return results with different scores? The results from the different servers match exactly. i.e If I searched for:

  • Name - John Smith
  • DOB - 11/11/1934

Partition 0 would return a score of 0.345 Partition 1 would return a score of 0.337

Both matched exactly on the name and dob.

+8  A: 

The scoring contains the Inverse Document Frequency(IDF). If the term "John Smith" is in one partition, 0, 100 times and in partition 1, once. The score for searching for John Smith would be higher search in partition 1 as the term is more scarce.

To get round this you would wither have to have your index being over all partitions, or you would need to override the IDF.

Stephen Hendry
Or you could construct a multisearcher from all indices.
Shashikant Kore
+3  A: 

Because the score is determined on the Index if I am not completely mistaken. If you have different indexes (more/less or different data that was indexed), the score will differ:

http://lucene.apache.org/java/2_4_0/scoring.html

(Warning: Contains Math :-))

Michael Stum
updated link: http://lucene.apache.org/java/2_4_0/scoring.html
Gene T
Thanks, I've updated it in the answer.
Michael Stum
+3  A: 

You may also be interested in the output of the explain() method, which will give you an idea of how things are scored the way they are:

http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query,%20int)

and the Explanation object:

http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/Explanation.html

(Ick, scary URLs.)

Joe Shaw