I have a question regarding Lucene scoring. I have two documents in the index, one contains "my name" and the other contains "my first name". When I search for the keyword "my name", the second document is listed above the first one. What I want is that if the document contains exact keyword I typed, it should be listed first, then the other. Can anyone help me how to do this. Thanks.
A second attempt at an answer: Lucene's default behavior should be what you ask for. The critical factor here is the lengthNorm() part of the score - which sometimes scores longer documents lower than shorter ones. See Lucene's Similarity API for the context. If, say, the lengthNorm was identical for the two hits, they were sorted arbitrarily.
The explain() function will help you see why the documents were scored the way they were, and not according to the default.
I assume you are using a BooleanQuery. If you post the exact way your query is formulated, I may be able to say more. See also the Query Parser Syntax. I hope this is nearer to the mark.
If you use lucli from the command line (download the latest Lucene source and it's in the contrib directory), you can use the "explain" command to get Lucene to explain why it has scored it so highly.
It'll come out with something like this:
---------------- 2 score:0.6089077---------------------
(blah blah your document)
Explanation:4.260467 = (MATCH) sum of:
0.59024054 = (MATCH) weight(description:warwick in 276780), product of:
0.05595057 = queryWeight(description:warwick), product of:
5.2746606 = idf(docFreq=13531, numDocs=843621)
0.010607426 = queryNorm
10.549321 = (MATCH) fieldWeight(description:warwick in 276780), product of:
1.0 = tf(termFreq(description:warwick)=1)
5.2746606 = idf(docFreq=13531, numDocs=843621)
2.0 = fieldNorm(field=description, doc=276780)
0.832554 = (MATCH) weight(keywords:warwick in 276780), product of:
0.066450186 = queryWeight(keywords:warwick), product of:
6.264497 = idf(docFreq=5028, numDocs=843621)
0.010607426 = queryNorm
12.528994 = (MATCH) fieldWeight(keywords:warwick in 276780), product of:
1.0 = tf(termFreq(keywords:warwick)=1)
6.264497 = idf(docFreq=5028, numDocs=843621)
2.0 = fieldNorm(field=keywords, doc=276780)
0.19180772 = (MATCH) weight(url:warwick in 276780), product of:
0.048220757 = queryWeight(url:warwick), product of:
4.5459433 = idf(docFreq=28043, numDocs=843621)
0.010607426 = queryNorm
3.9777002 = (MATCH) fieldWeight(url:warwick in 276780), product of:
1.0 = tf(termFreq(url:warwick)=1)
4.5459433 = idf(docFreq=28043, numDocs=843621)
0.875 = fieldNorm(field=url, doc=276780)
0.023709858 = (MATCH) weight(content:warwick in 276780), product of:
0.03373665 = queryWeight(content:warwick), product of:
3.1804748 = idf(docFreq=109863, numDocs=843621)
0.010607426 = queryNorm
0.7027923 = (MATCH) fieldWeight(content:warwick in 276780), product of:
1.4142135 = tf(termFreq(content:warwick)=2)
3.1804748 = idf(docFreq=109863, numDocs=843621)
0.15625 = fieldNorm(field=content, doc=276780)
0.46163678 = (MATCH) weight(siteDescription:warwick in 276780), product of:
0.0494812 = queryWeight(siteDescription:warwick), product of:
4.6647696 = idf(docFreq=24901, numDocs=843621)
0.010607426 = queryNorm
9.329539 = (MATCH) fieldWeight(siteDescription:warwick in 276780), product of:
1.0 = tf(termFreq(siteDescription:warwick)=1)
4.6647696 = idf(docFreq=24901, numDocs=843621)
2.0 = fieldNorm(field=siteDescription, doc=276780)
0.96127754 = (MATCH) weight(siteUrl:warwick in 276780), product of:
0.10097861 = queryWeight(siteUrl:warwick), product of:
9.519615 = idf(docFreq=193, numDocs=843621)
0.010607426 = queryNorm
9.519615 = (MATCH) fieldWeight(siteUrl:warwick in 276780), product of:
1.0 = tf(termFreq(siteUrl:warwick)=1)
9.519615 = idf(docFreq=193, numDocs=843621)
1.0 = fieldNorm(field=siteUrl, doc=276780)
0.62917286 = (MATCH) weight(title:warwick in 276780), product of:
0.05776636 = queryWeight(title:warwick), product of:
5.4458413 = idf(docFreq=11402, numDocs=843621)
0.010607426 = queryNorm
10.891683 = (MATCH) fieldWeight(title:warwick in 276780), product of:
1.0 = tf(termFreq(title:warwick)=1)
5.4458413 = idf(docFreq=11402, numDocs=843621)
2.0 = fieldNorm(field=title, doc=276780)
0.57006776 = (MATCH) weight(second_title:warwick in 276780), product of:
0.05498614 = queryWeight(second_title:warwick), product of:
5.18374 = idf(docFreq=14819, numDocs=843621)
0.010607426 = queryNorm
10.36748 = (MATCH) fieldWeight(second_title:warwick in 276780), product of:
1.0 = tf(termFreq(second_title:warwick)=1)
5.18374 = idf(docFreq=14819, numDocs=843621)
2.0 = fieldNorm(field=second_title, doc=276780)
(Sorry, I only had a big index to get an example off, not a simple one!)
I will change the query as follows.
(my AND name) OR "my name"
Here, the additional phrase query adds to the score whenever there is a phrase match. In case, the document has "my first name" as the content, the phrase query will not result into any additional score. But the document with content "my name" will have the additional score and show up at the top.
Here, I am assuming length normalization is ignored.