I am trying to combine lucene score with PageRank, I tried to modify the DefaulySimilarity to add the PageRank I already have(in a Array with corresponding URL), but the problem is that I do not know how to get the document field which stored the URL of the document. The termDoc can only return the docID. Or I have another idea is to modify the TopScoreDocCollector which has a method named collect(int docid), also given a docid but I still do not know how to get the stored field. Anyone got a idea about how to get the stored field of a document by a document id? Or got a idea about how to combine lucene with PageRank? Thank you very much.
A:
To get the value of a stored field in Lucene by the internal Lucene ID, use IndexReader.document(int n). If you have your own UID's indexed, you'll need to search by that term, get the Lucene ID, and then call IndexReader.document(int n).
Are you trying to calculate PageRank on the fly? If you are, that seems crazy to me. Usually PageRank is a batch process that runs, and the static PageRank score that is assigned for each document is added as a boost during indexing time.
bajafresh4life
2010-07-22 13:47:33
A:
You probably want to use Nutch instead of plain Lucene if you need PageRank. See http://wiki.apache.org/nutch/NewScoring for more.
Xodarap
2010-07-26 16:50:32