tags:

views:

670

answers:

3

I've had this long term issue in not quite understanding how to implement a decent Lucene sort or ranking. Say I have a list of cities and their populations. If someone searches "new" or "london" I want the list of prefix matches ordered by population, and I have that working with a prefix search and an sort by field reversed, where there is a population field, IE New Mexico, New York; or London, Londonderry.

However I also always want the exact matching name to be at the top. So in the case of "London" the list should show "London, London, Londonderry" where the first London is in the UK and the second London is in Connecticut, even if Londonderry has a higher population than London CT.

Does anyone have a single query solution?

A: 
dlamblin
+2  A: 

dlamblin,let me see if I get this correctly: You want to make a prefix-based query, and then sort the results by population, and maybe combine the sort order with preference for exact matches. I suggest you separate the search from the sort and use a CustomSorter for the sorting: Here's a blog entry describing a custom sorter. The classic Lucene book describes this well.

Yuval F
Thank you for your blog post explaining how to implement a sort comparator that conveniently does not require defining 2 classes. However because the sort comparator can only work on two documents without knowing the search term it cannot rank the results as I've described them in my question. How would the sort comparator know that the name field "london" exactly matches the search term "london" if it cannot access the search term?
dlamblin
I think you can do the following: The class implementing the ScoreDocComparator interface (AZ09Comparator in the blog example), will have a "search term" member, to be set when running the query.The comparing method (compare() in the blog example) can access this field during the time it is called, and rank a document with an exact match higher than another not having an exact match.
Yuval F
Dang, that's what I get for not thinking it through (though it's been a while since I was in front of that code). Now this makes a lot more sense and is helpful.
dlamblin
+1  A: 

API for

Sortcomparator

says

There is a distinct Comparable for each unique term in the field - if some documents have the same term in the field, the cache array will have entries which reference the same Comparable

You can apply a

FieldSortedHitQueue

to the sortcomparator which has a Comparator field for which the api says ...

Stores a comparator corresponding to each field being sorted by.

Thus the term can be sorted accordingly

Narayan