views:

146

answers:

1

I've been working with NHibernate, NHibernate.Search and Lucene.Net to improve the search engine used on the website I develop.

Basically, I use it to search contents of corporations specification documents. This is not to be confused with Lucene's notion of documents: in my case, a specification document (which I'll hereafter call a "specdoc") can contain many pages, and the content of these pages are the ones that are actually indexed (thus, the pages themselves are the ones that fall into Lucene's concept of documents). So, the pages belong to a specdoc, that in turn belong to a corporation (so, a corporation can have many specdocs). I'm using NHibernate.Search "IndexEmbedded" and "ContainedIn" attributes to associate the pages with their specdoc and the specdocs to their corporations, so I can query for terms in specdoc pages and have Lucene/NH.Search return either the pages themselves, the specdocs, or the corporations that match the query on the pages. I can query this way and get ranked results, thus presenting results (that is, corporations, specdocs or pages) by relevance, which is great.

But now I need something more. Specifically in the case where I query terms and have NH.Search return the corporations that match, I need to manually/artificially tune the score of some of the results, because there are corporations that I want to show up on the top of the result set - think of "sponsored results".

I'm thinking of doing it on my application, maybe creating an entity/database table that contain an association to the corporation entity, and a score boost value. But I don't know how to feed this to Lucene and have it boost the results accordingly at search time. Initially I thought about deriving a Similarity class to do this, but it doesn't look like Similarity can be used to modify result sets at search time. As per this page, it looks like what I need is to mess around with weight or scoring. But the docs are a little superficial in that there are no examples on how to implement a custom scoring, let alone integrate it with NH.Search.

So, does anyone know how to do this, or point me to some documentation or working example on how to do something similar?

Thanks!

A: 

From what i understand, you just want to be able to set a boost at query time, instead of index time. This can be done, easily. When you build you query, you can set the boost then. The Query object contains a SetBoost property that allows you to boost the documents that match the whole query. This is useful for when you are using two term queries and you want one of them to be boosted. But, if you are using something like QueryParser to build you queries, there is a syntax for query parser to set the boost for the terms. More about that here http://lucene.apache.org/java/2_9_0/queryparsersyntax.html#Boosting%20a%20Term. Now if you are using query parser, you could possible use some regex or adjust the query parser string to add in the additional symbol to boost a term or you can maybe look into creating your own query parser, which will add the boost when it decides it must be added. I've created my own query parser because, and it isn't that difficult. Here is some information about that http://openedu.ossreleasefeed.com/tutorials/apache-lucene-extending-the-queryparser/

Andrew Smith
Yeah, I thought about tweaking the query to reach get the results I need, but I couldn't do it that way, at least I don't know how. Here's the deal: I store/index, along with the textual content of the specdoc pages, the Ids of the specdocs and corporations related to the content. So, if I query something like this:SpecDoc.Pages.content:white AND SpecDoc.CorpID:32it returns Corporations with specdoc pages containing "white" in the content, exclusively from the corporation with ID 32.
Fernando Figueiredo
Now, extrapolating from that, this would come close to the behaviour I need:SpecDoc.Pages.content:white OR SpecDoc.CorpID:64^100 - But that obviously is not quite what I need: It would bring results from corporations with ID 64, boosted to score 100, even if their pages don't contain "white".
Fernando Figueiredo
What I need is the score boosting on the CorpID to take place only if their pages contain "white", otherwise, they shouldn't show up on the results at all. Now either my understanding of Lucene query syntax is lacking (my reference already was the page you posted), or I need something else. I haven't got time to read your blog post with attention yet, so I'll take a look into it later and see if it's useful. Thanks!
Fernando Figueiredo