views:

48

answers:

3

Hi,

I meet some problem with the score caculation with a PrefixQuery. To change score of each document, when add document into index, I have used setBoost to change the boost of the document. Then I create PrefixQuery to search, but the result have not been changed according to the boost. It seems setBoost totally doesn't work for a PrefixQuery. Please check my code below:

 @Test
 public void testNormsDocBoost() throws Exception {
    Directory dir = new RAMDirectory();
    IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_CURRENT), true,
            IndexWriter.MaxFieldLength.LIMITED);
    Document doc1 = new Document();
    Field f1 = new Field("contents", "common1", Field.Store.YES, Field.Index.ANALYZED);
    doc1.add(f1);
    doc1.setBoost(100);
    writer.addDocument(doc1);
    Document doc2 = new Document();
    Field f2 = new Field("contents", "common2", Field.Store.YES, Field.Index.ANALYZED);
    doc2.add(f2);
    doc2.setBoost(200);
    writer.addDocument(doc2);
    Document doc3 = new Document();
    Field f3 = new Field("contents", "common3", Field.Store.YES, Field.Index.ANALYZED);
    doc3.add(f3);
    doc3.setBoost(300);
    writer.addDocument(doc3);
    writer.close();

    IndexReader reader = IndexReader.open(dir);
    IndexSearcher searcher = new IndexSearcher(reader);

    TopDocs docs = searcher.search(new PrefixQuery(new Term("contents", "common")), 10);
    for (ScoreDoc doc : docs.scoreDocs) {
        System.out.println("docid : " + doc.doc + " score : " + doc.score + " "
                + searcher.doc(doc.doc).get("contents"));
    }
} 

The output is :

 docid : 0 score : 1.0 common1
 docid : 1 score : 1.0 common2
 docid : 2 score : 1.0 common3
A: 

It is the expected behavior. Here is the explanation of Lucene creator's Doug Cutting:

A PrefixQuery is equivalent to a query containing all the terms matching the prefix, and is hence usually contains a lot of terms. With such a big query, matching documents are likely to contain fewer of the query terms and the match is thus weaker.

Read the original post where the quote is taking from.

With Lucene, it is generally better to use the score only as a relative measure of relevancy in a set of documents. The absolute value of the score will change depending on so many factors that it should not be used as is.

UPDATE
The explanation from Cutting refers to an older version of Lucene. Thus the answer from bajafresh4life is the correct one.

Pascal Dimassimo
+3  A: 

By default, PrefixQuery rewrites the query to use ConstantScoreQuery, which gives every single matching document a score of 1.0. I think this is to make PrefixQuery faster. So your boosts are getting ignored.

If you want the boosts to take effect in your PrefixQuery, you need to call setRewriteMethod(), using the SCORING_BOOLEAN_QUERY_REWRITE constant on your prefix query instance. See http://lucene.apache.org/java/2_9_1/api/all/index.html .

For debugging, you can use searcher.explain().

bajafresh4life
A: 

Hi bajafresh4life,

you suggestion works! Thank you very much! I have spent quite much time on this issue, you really give me big help!

Keven