views:

620

answers:

3

Hi,

I was browsing the web looking for a indexing and search framework and stumbled upon Solr. A functionality that we abolutely need is to boost results based on what field contained the hit.

A small example:

Consider a record like this:

<movie>
  <title>The Dark Knight</title>
  <alternative_title>Batman Begins 2</alternative_title>
  <year>2008</year>
  <director>Christopher Nolan</director>
  <plot>Batman, Gordon and Harvey Dent are forced to deal with the chaos unleashed by an anarchist mastermind known only as the Joker, as it drives each of them to their limits.</plot>
</movie>

I want to combine for example the title, alternative_title and plot fields into one search field, which isn't too difficult after looking at the Solr/Lucene documentation and tutorials. However I also want that movies that have a hit in title have a higher score than hits on alternative_title and those in their turn should score higher than hits in the plot field. Is there any way to indicate this kond of scoring in the xml or do we need to develop some custom scoring algorythm? Please also note that the example I've givnen is fictional end the real data will probably contain 100+ fields.

Thanks in advance, Tom

A: 

Hi,

If this is functionality that isn't specific to one search but the whole site. You can boost the title at indexing time. Boosting give the field a higher relevance score, which sounds exactly what you want.

Check out this link:

http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22

CraftyFella
+1  A: 

I haven't used Solr, but I've used Lucene. In looking at:

http://wiki.apache.org/solr/SolrQuerySyntax

It states that Solr's query syntax is a superset of Lucene's. And in Lucene, the way you can perform per-field boosts is to use the carrot operator followed by some arbitrary value, i.e.

title:batman^10 alternative_title:batman

The advantage of doing this at query-time is that you can tweak the boost value on the fly to match whatever standard of relevance you have. So if a boost value of 10 is too high, you can tweak it so accordingly.

bajafresh4life
+1  A: 

This is what Solr's DismaxQueryParser was designed for. See http://wiki.apache.org/solr/DisMaxRequestHandler

There are a lot of parameters, but the main one you need to customize is "qf", which is how you specify what fields should be searched and the boost for each. So if you want title to dominate, you might specify something like:

title^10 alternative_title^2 director^1 plot^1

as the value of the qf parameter. You can set this up by customizing the example configuration and experiment from there.

KenE