views:

15

answers:

1

I want to insert another score factor in Lucene's similarity equation. The problem is that I can't just override Similarity class, as it is unaware of the document and terms it is computing scores.

For example, in a document with the text below:

The cat is in the top of the tree, and he is going to stay there.

I have an algorithm of my own, that assigns for each one the terms in this document a score regarding how much each one of them are important to the document as whole. A possible score for each word is:

cat: 0.789212
tree: 0.633423
top: 0.412315
stay: 0.123912
there: 0.0999842
going: 0.00988412
...

The score for each word is different from document to document. For example, in another document cat could have score: 0.0023912

I want to add this score to the Lucene's scoring, but I'm kind of lost on how to do that.

Any tips?

+3  A: 

Use Lucene's Payload feature:

From: http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/

  1. Add a Payload to one or more Tokens during indexing.
  2. Override the Similarity class to handle scoring payloads
  3. Use a Payload aware Query during your search
bajafresh4life