tags:

views:

124

answers:

1

I have a relatively simple Lucene index, being served by Solr. The index consists of two major fields, title and body, and a few less-important fields.

Most search engines give more relevance to results with matches in the title, over the body. I'm going to start providing an index-time boost to the title field.

My question is, what values do people typically use for their title fields? 2? 4? 10? 100?

+2  A: 

I suggest you divide the median body length by the median title length. This roughly gives you a factor M - for M appearances of a word in the body, it will appear once in the title. Now, use something like M*3. This is, of course, a rationalized heuristic, and it is best you iterate over the values. See Grant Ingersoll's "Debugging Relevance Issues in Search" for a much more structured discussion.

Yuval F