Hi there,
I am trying to update the searching of terms of documents within my Lucene index. Currently the searches score on the number of times the term appears in the document. What I would like to do is score if the term exists, rather than the number of times the term exists. So a document with the term in it once scores the same as a document with the term in it 100 times.
I've tried to extend the Zend_Search_Lucene_Search_Similarity with my own class, but to be honest I am not sure if this is working correctly as the scores are still quite low.
class MySimilarity extends Zend_Search_Lucene_Search_Similarity{
//override the default frequency of searching
public function tf($freq){
return 1.0;
}
public function lengthNorm($fieldName, $numTerms) {
return 1.0/sqrt($numTerms);
}
public function queryNorm($sumOfSquaredWeights) {
return 1.0/sqrt($sumOfSquaredWeights);
}
public function sloppyFreq($distance) {
return 1.0;
}
public function idfFreq($docFreq, $numDocs) {
return log($numDocs/(float)($docFreq+1)) + 1.0;
}
public function coord($overlap, $maxOverlap) {
return $overlap/(float)$maxOverlap;
}
}
Now this is built from examples I have found when searching good old google. However the only real change I've done has been to the tf() function.
Any help with this and I would be really greatful as at the moment it's really messing up my searches.
Thanks,
Grant