ansaurus

Question

"boosting" different instances of the same field in a lucene document

Answer 1

A:

If you want to take a page out of Google's book (at least their old book), then you may want to create separate indexes: one for document bodies, another for titles. I'm assuming there is a field stored that points to a true UID for each actual document.

The alternative answer is to write custom implementations of [Similarity][1] to get the behavior you want. Unfortunately I find that Lucene often needs this customization unique problems arise.

[1]: http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#lengthNorm(java.lang.String, int)

Snekse 2010-10-13 15:13:20

Just thought of another reason you may want to keep these data elements in separate fields or separate indexes: if they share the same field name in the same index, the mass amounts of content in Body could wreck havoc on the term frequency for Title. Words like Menu, Table or Home (if you're using basic webpages) would start to appear more often giving those words less weight in the Title.

Snekse 2010-10-13 15:20:02

Answer 2

A:

You can index title and body separately with title field boosted by a desired value. Then, you can use MultiFieldQueryParser to search multiple fields.

While, technically, searching multiple fields takes longer time, typically even with this overhead, Lucene tends to be extremely fast (of the order of few tens or hundreds of milliseconds.)

Shashikant Kore 2010-10-15 06:56:14

ansaurus

tags:

views:

answers:

"boosting" different instances of the same field in a lucene document

related questions