views:

22

answers:

2

I want to use a single field to index the document's title and body, in an effort to improve performance.

The idea was to do something like this:

Field title = new Field("text", "alpha bravo charlie", Field.Store.NO, Field.Index.ANALYZED);
title.setBoost(3)
Field body = new Field("text", "delta echo foxtrot", Field.Store.NO, Field.Index.ANALYZED);
Document doc = new Document();
doc.add(title);
doc.add(body);

And then I could just do a single TermQuery instead of a BooleanQuery for two separate fields.

However, it turns out that a field boost is the multiple of all the boost of fields of the same name in the document. In my case, it means that both fields have a boost of 3.

Is there a way I can get what I want without resorting to using two different fields? One way would be to add the title field several times to the document, which increases the term frequency. This works, but seems incredibly brain-dead.

I also know about payloads, but that seems like an overkill for what I'm after.

Any ideas?

A: 

If you want to take a page out of Google's book (at least their old book), then you may want to create separate indexes: one for document bodies, another for titles. I'm assuming there is a field stored that points to a true UID for each actual document.

The alternative answer is to write custom implementations of [Similarity][1] to get the behavior you want. Unfortunately I find that Lucene often needs this customization unique problems arise.

[1]: http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#lengthNorm(java.lang.String, int)

Snekse
Just thought of another reason you may want to keep these data elements in separate fields or separate indexes: if they share the same field name in the same index, the mass amounts of content in Body could wreck havoc on the term frequency for Title. Words like Menu, Table or Home (if you're using basic webpages) would start to appear more often giving those words less weight in the Title.
Snekse
A: 

You can index title and body separately with title field boosted by a desired value. Then, you can use MultiFieldQueryParser to search multiple fields.

While, technically, searching multiple fields takes longer time, typically even with this overhead, Lucene tends to be extremely fast (of the order of few tens or hundreds of milliseconds.)

Shashikant Kore