tags:

views:

472

answers:

1

I currently have a SOLR query which uses the query (q), query fields (qf) and phrase fields (pf) to retrieve the results I want. An example is:

/solr/select
?q=superbowl
&qf=title^3+headline^2+intro+fulltext
&pf=title^3+headline^2+intro+fulltext
&fl=id,title,ts_modified,score
&debugQuery=true

The idea is that the title and headline of the "main item" give the best indication of what the result is "about", but the intro and fulltext provides some input too. Ie, imagine a collection of links, where the collection itself has metadata (what it's a collection of), but each link has it's own data (title of the link, synopsis, etc). If we search for "superbowl", the most relevant results are the ones with "superbowl" in the collection metadata, the least relevant results are those with "superbowl" in just the synopsis of one of the links... but they're all valid results.

What I'm trying to do is add a boost to the relevancy score so that the most recent results float towards the top, but retaining title,headline,intro,fulltext as part of the formula. A recent result with the search string in the collection metadata would be more relevant than one with it only in the links metadata... but that "links only" recent result might be more relevant than a very old result with the search string in the collection metadata. (I hope that's somewhat clear).

The problem is that I can't figure out how to combine the boost function documented on the SOLR site with the use of the qf/pf fields. Specifically...

From the SOLR site, something like the following works to boost the results by date:

/solr/select
?q={!boost%20b=$dateboost%20v=$qq}
&dateboost=ord(ts_modified)
&qq=superbowl
&fl=ts_modified,score
&debugQuery=true

However, I can't figure out how to combine that query with the use of qf and pf. Any suggestions would be more than welcome.

Thanks to danben's response, I was able to come up with the following:

/solr/select
?q={!boost%20b=$dateboost%20v=$qq%20defType=dismax}
&dateboost=ord(ts_modified)
&qq=superbowl
&qf=title^3+headline^2+intro^2+fulltext
&pf=title^3+headline^2+intro^2+fulltext
&fl=ts_modifieds,score
&debugQuery=true

It looks like the actual problems I was having were:

  • I left spaces in the q param instead of escaping them (%20) when copy/pasting
  • I didn't include the defType=dismax in my q param, so that it would pay attention to the qf/pf parameters
+1  A: 

Check out http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

This is based on the ms function, which returns the difference in milliseconds between two timestamps / dates, and ReciprocalFloatFunction which increases as the value passed decreases.

Since you are using the DisMaxRequestHandler, you may need to specify your query using the bq/bf parameters. From http://lucene.apache.org/solr/api/org/apache/solr/handler/DisMaxRequestHandler.html:

bq - (Boost Query) a raw lucene query that will be included in the users query to influence the score. If this is a BooleanQuery with a default boost (1.0f), then the individual clauses will be added directly to the main query. Otherwise, the query will be included as is. This param can be specified multiple times, and the boosts are are additive. NOTE: the behaviour listed above is only in effect if a single bq paramter is specified. Hence you can disable it by specifying an additional, blank, bq parameter.

bf - (Boost Functions) functions (with optional boosts) that will be included in the users query to influence the score. Format is: "funcA(arg1,arg2)^1.2 funcB(arg3,arg4)^2.2". NOTE: Whitespace is not allowed in the function arguments. This param can be specified multiple times, and the functions are additive.

danben
So, the sad thing is that I read that page multiple times in trying to figure things out. I went back and reread it because you seemed to think the answer was there... and I wound up getting it to work by replacing the spaces in the query (1) with %20, and adding the "defType=dismax" info to it. Thanks much.
RHSeeger
One more thing to note is that if you aren't using TrieFields (introduced in Solr 1.4) for your dates, you can't use the ms function.
Eric Pugh