tags:

views:

77

answers:

2

Hi all,

OUr team just upgrade lucene from 2.3 to 3.0 and we are confused about the setboost and getboost of document. What we want is just set a boost for each document when add them into index, then when search it the documents in the response should have different order according to the boost I set. But it seems the order is not changed at all, even the boost of each document in the search response is still 1.0. Could some one give me some hit? Following is our code:

    String[] a = new String[] { "schindler", "spielberg", "shawshank", "solace", "sorcerer", "stone", "soap",
                "salesman", "save" };
    List<String> strings = Arrays.asList(a);
    AutoCompleteIndex index = new Index();
    IndexWriter writer = new IndexWriter(index.getDirectory(), AnalyzerFactory.createAnalyzer("en_US"), true,
                MaxFieldLength.LIMITED);
    float i = 1f;
    for (String string : strings) {
        Document doc = new Document();
        Field f = new Field(AutoCompleteIndexFactory.QUERYTEXTFIELD, string, Field.Store.YES,
                Field.Index.NOT_ANALYZED);
        doc.setBoost(i);
        doc.add(f);
        writer.addDocument(doc);
        i += 2f;
    }

    writer.close();
    IndexReader reader2 = IndexReader.open(index.getDirectory());
    for (int j = 0; j < reader2.maxDoc(); j++) {
        if (reader2.isDeleted(j)) {
            continue;
        }

        Document doc = reader2.document(j);
        Field f = doc.getField(AutoCompleteIndexFactory.QUERYTEXTFIELD);
        System.out.println(f.stringValue() + ":" + f.getBoost() + ", docBoost:" + doc.getBoost());
        doc.setBoost(j);

    }
A: 

The document boost is meant to take effect when you search, not when you sequentially go over the documents in the index, like in your code sample. Try to make the following experiment:

  1. Index just two documents: the first with id 1, text "schindler" and boost 3.0; the second with id 2, text "schindler" and boost 1.0.
  2. Open an IndexSearcher.
  3. Search for "schindler" and see the order of documents according to their ids. The first id should be 1, because of the higher boost.

The meaning of document boost is: When all other scoring factors are equal, the document with the higher boost gets a higher score. Please see Lucene's scoring documentation and explain() function.

Yuval F
A: 

Hi Daniel,

Thank you for your answer. I have updated the code according to your suggestion, but it seems it still doesn't work. It seems the order of the result has not been changed by boost and the score of each search reults are the same (1.0). Please check my code below:

public void testScore() throws Exception { String[] a = new String[] { "schindler", "spielberg", "shawshank", "solace", "sorcerer", "stone", "soap", "salesman", "save" }; List strings = Arrays.asList(a); AutoCompleteIndex index = new Index(); IndexWriter writer = new IndexWriter(index.getDirectory(), AnalyzerFactory.createAnalyzer("en_US"), true, MaxFieldLength.LIMITED);

    float i = 1f;
    for (String string : strings) {
        Document doc = new Document();
        doc.add(new Field(AutoCompleteIndexFactory.QUERYTEXTFIELD, string, Field.Store.YES,
                Field.Index.NOT_ANALYZED));
        doc.setBoost(i);
        //            System.out.println(doc.getBoost());
        i += 2f;
        writer.addDocument(doc);
    }

    writer.close();


    BooleanQuery
            .setMaxClauseCount(BooleanQuery.getMaxClauseCount() < getMaxQueryTextEntry() ? getMaxQueryTextEntry()
                    : BooleanQuery.getMaxClauseCount());
    Term searchTerm = new Term(AutoCompleteIndexFactory.QUERYTEXTFIELD, "s");
    PrefixQuery query = new PrefixQuery(searchTerm);
    IndexSearcher searcher = new IndexSearcher(index.getDirectory());

    TopDocs docs = searcher.search(query, 10);
    ScoreDoc[] hits = docs.scoreDocs;
    for (ScoreDoc hit2 : hits) {
        String hit = searcher.doc(hit2.doc).get(AutoCompleteIndexFactory.QUERYTEXTFIELD);
        System.out.println(hit + " score:" + hit2.score);
        System.out.println(searcher.explain(query, hit2.doc));

    }

}

And the output is:

Jun 17, 2010 4:12:18 PM INFO:

schindler score:1.0 1.0 = (MATCH) ConstantScoreQuery(querytexts:s*), product of: 1.0 = boost 1.0 = queryNorm

spielberg score:1.0 1.0 = (MATCH) ConstantScoreQuery(querytexts:s*), product of: 1.0 = boost 1.0 = queryNorm

shawshank score:1.0 1.0 = (MATCH) ConstantScoreQuery(querytexts:s*), product of: 1.0 = boost 1.0 = queryNorm

solace score:1.0 1.0 = (MATCH) ConstantScoreQuery(querytexts:s*), product of: 1.0 = boost 1.0 = queryNorm

sorcerer score:1.0 1.0 = (MATCH) ConstantScoreQuery(querytexts:s*), product of: 1.0 = boost 1.0 = queryNorm

stone score:1.0 1.0 = (MATCH) ConstantScoreQuery(querytexts:s*), product of: 1.0 = boost 1.0 = queryNorm

soap score:1.0 1.0 = (MATCH) ConstantScoreQuery(querytexts:s*), product of: 1.0 = boost 1.0 = queryNorm

salesman score:1.0 1.0 = (MATCH) ConstantScoreQuery(querytexts:s*), product of: 1.0 = boost 1.0 = queryNorm

save score:1.0 1.0 = (MATCH) ConstantScoreQuery(querytexts:s*), product of: 1.0 = boost 1.0 = queryNorm

Keven