tags:

views:

230

answers:

3

I have a question regarding Lucene scoring. I have two documents in the index, one contains "my name" and the other contains "my first name". When I search for the keyword "my name", the second document is listed above the first one. What I want is that if the document contains exact keyword I typed, it should be listed first, then the other. Can anyone help me how to do this. Thanks.

+1  A: 

A second attempt at an answer: Lucene's default behavior should be what you ask for. The critical factor here is the lengthNorm() part of the score - which sometimes scores longer documents lower than shorter ones. See Lucene's Similarity API for the context. If, say, the lengthNorm was identical for the two hits, they were sorted arbitrarily.

The explain() function will help you see why the documents were scored the way they were, and not according to the default.

I assume you are using a BooleanQuery. If you post the exact way your query is formulated, I may be able to say more. See also the Query Parser Syntax. I hope this is nearer to the mark.

Yuval F
This will cause the second document to be the *only* document matched. The poster requested it merely to get a higher *score* than the other.
Avi
Thanks for your reply. But what I would like to do is something different. I want to search for all documents that contains two words "my" and "name". The issue here is that the keyword I entered is "my name" so I want the results that contain the whole phrase "my name" is at the top of the list, and the results that contains "my first name" is at the bottom.
Truong Do
I have edited my answer to reflect this. Please reread.
Yuval F
A: 

If you use lucli from the command line (download the latest Lucene source and it's in the contrib directory), you can use the "explain" command to get Lucene to explain why it has scored it so highly.

It'll come out with something like this:

---------------- 2 score:0.6089077---------------------

(blah blah your document)

Explanation:4.260467 = (MATCH) sum of:                                                                                                                                                                                                       
  0.59024054 = (MATCH) weight(description:warwick in 276780), product of:                                                                                                                                                                    
    0.05595057 = queryWeight(description:warwick), product of:                                                                                                                                                                               
      5.2746606 = idf(docFreq=13531, numDocs=843621)                                                                                                                                                                                         
      0.010607426 = queryNorm                                                                                                                                                                                                                
    10.549321 = (MATCH) fieldWeight(description:warwick in 276780), product of:                                                                                                                                                              
      1.0 = tf(termFreq(description:warwick)=1)                                                                                                                                                                                              
      5.2746606 = idf(docFreq=13531, numDocs=843621)                                                                                                                                                                                         
      2.0 = fieldNorm(field=description, doc=276780)                                                                                                                                                                                         
  0.832554 = (MATCH) weight(keywords:warwick in 276780), product of:                                                                                                                                                                         
    0.066450186 = queryWeight(keywords:warwick), product of:                                                                                                                                                                                 
      6.264497 = idf(docFreq=5028, numDocs=843621)                                                                                                                                                                                           
      0.010607426 = queryNorm                                                                                                                                                                                                                
    12.528994 = (MATCH) fieldWeight(keywords:warwick in 276780), product of:                                                                                                                                                                 
      1.0 = tf(termFreq(keywords:warwick)=1)                                                                                                                                                                                                 
      6.264497 = idf(docFreq=5028, numDocs=843621)                                                                                                                                                                                           
      2.0 = fieldNorm(field=keywords, doc=276780)                                                                                                                                                                                            
  0.19180772 = (MATCH) weight(url:warwick in 276780), product of:                                                                                                                                                                            
    0.048220757 = queryWeight(url:warwick), product of:                                                                                                                                                                                      
      4.5459433 = idf(docFreq=28043, numDocs=843621)                                                                                                                                                                                         
      0.010607426 = queryNorm                                                                                                                                                                                                                
    3.9777002 = (MATCH) fieldWeight(url:warwick in 276780), product of:                                                                                                                                                                      
      1.0 = tf(termFreq(url:warwick)=1)                                                                                                                                                                                                      
      4.5459433 = idf(docFreq=28043, numDocs=843621)                                                                                                                                                                                         
      0.875 = fieldNorm(field=url, doc=276780)                                                                                                                                                                                               
  0.023709858 = (MATCH) weight(content:warwick in 276780), product of:                                                                                                                                                                       
    0.03373665 = queryWeight(content:warwick), product of:                                                                                                                                                                                   
      3.1804748 = idf(docFreq=109863, numDocs=843621)                                                                                                                                                                                        
      0.010607426 = queryNorm                                                                                                                                                                                                                
    0.7027923 = (MATCH) fieldWeight(content:warwick in 276780), product of:                                                                                                                                                                  
      1.4142135 = tf(termFreq(content:warwick)=2)                                                                                                                                                                                            
      3.1804748 = idf(docFreq=109863, numDocs=843621)                                                                                                                                                                                        
      0.15625 = fieldNorm(field=content, doc=276780)                                                                                                                                                                                         
  0.46163678 = (MATCH) weight(siteDescription:warwick in 276780), product of:                                                                                                                                                                
    0.0494812 = queryWeight(siteDescription:warwick), product of:                                                                                                                                                                            
      4.6647696 = idf(docFreq=24901, numDocs=843621)                                                                                                                                                                                         
      0.010607426 = queryNorm                                                                                                                                                                                                                
    9.329539 = (MATCH) fieldWeight(siteDescription:warwick in 276780), product of:                                                                                                                                                           
      1.0 = tf(termFreq(siteDescription:warwick)=1)                                                                                                                                                                                          
      4.6647696 = idf(docFreq=24901, numDocs=843621)                                                                                                                                                                                         
      2.0 = fieldNorm(field=siteDescription, doc=276780)                                                                                                                                                                                     
  0.96127754 = (MATCH) weight(siteUrl:warwick in 276780), product of:                                                                                                                                                                        
    0.10097861 = queryWeight(siteUrl:warwick), product of:                                                                                                                                                                                   
      9.519615 = idf(docFreq=193, numDocs=843621)                                                                                                                                                                                            
      0.010607426 = queryNorm                                                                                                                                                                                                                
    9.519615 = (MATCH) fieldWeight(siteUrl:warwick in 276780), product of:                                                                                                                                                                   
      1.0 = tf(termFreq(siteUrl:warwick)=1)                                                                                                                                                                                                  
      9.519615 = idf(docFreq=193, numDocs=843621)                                                                                                                                                                                            
      1.0 = fieldNorm(field=siteUrl, doc=276780)                                                                                                                                                                                             
  0.62917286 = (MATCH) weight(title:warwick in 276780), product of:                                                                                                                                                                          
    0.05776636 = queryWeight(title:warwick), product of:                                                                                                                                                                                     
      5.4458413 = idf(docFreq=11402, numDocs=843621)                                                                                                                                                                                         
      0.010607426 = queryNorm                                                                                                                                                                                                                
    10.891683 = (MATCH) fieldWeight(title:warwick in 276780), product of:                                                                                                                                                                    
      1.0 = tf(termFreq(title:warwick)=1)                                                                                                                                                                                                    
      5.4458413 = idf(docFreq=11402, numDocs=843621)                                                                                                                                                                                         
      2.0 = fieldNorm(field=title, doc=276780)                                                                                                                                                                                               
  0.57006776 = (MATCH) weight(second_title:warwick in 276780), product of:                                                                                                                                                                   
    0.05498614 = queryWeight(second_title:warwick), product of:                                                                                                                                                                              
      5.18374 = idf(docFreq=14819, numDocs=843621)                                                                                                                                                                                           
      0.010607426 = queryNorm                                                                                                                                                                                                                
    10.36748 = (MATCH) fieldWeight(second_title:warwick in 276780), product of:                                                                                                                                                              
      1.0 = tf(termFreq(second_title:warwick)=1)                                                                                                                                                                                             
      5.18374 = idf(docFreq=14819, numDocs=843621)                                                                                                                                                                                           
      2.0 = fieldNorm(field=second_title, doc=276780)

(Sorry, I only had a big index to get an example off, not a simple one!)

Mat Mannion
A: 

I will change the query as follows.

(my AND name) OR "my name"

Here, the additional phrase query adds to the score whenever there is a phrase match. In case, the document has "my first name" as the content, the phrase query will not result into any additional score. But the document with content "my name" will have the additional score and show up at the top.

Here, I am assuming length normalization is ignored.

Shashikant Kore