views:

164

answers:

1

I have a Lucene index which populates from a database. I store/index some fields and then add a FullText field in which I index the contents of all the other fields, so I can do a general search.

Now let's say I have a document with the following two fields: fld1 - "Samsung releases a new 22'' LCD screen" fld2 - "Sony Ericsson phone's batteries explode"

If an user does a "Samsung phone", he probably just wants news about samsung phones, not a document with info about a samsung screen and a sony phone, but searching by the FullText field, I will get this as a valid result. Is there a nice way to handle this?

I've thought of indexing with some separator and the doing a SpanNotQuery, so the FullText field would have this contents: "Samsung releases a new 22'' LCD screen MYLUCENESEPARATOR Sony Ericsson phone's batteries explode" and then doing a SpanNotQuery with MYLUCENESEPARATOR as the non-spanning term.

Is this a good solution? Does it scale well with more than two terms? I fear it would be a performance killer. Is there a better way to achieve this?

+2  A: 

If the number of fields is limited you can put the two description strings in two different fields. Then you can use MultiFieldQueryParser to search on these fields. Since these are two separate fields, the document will match only if both the terms appear in a single field with AND operator.

Let's take your example. fld1 - "Samsung releases a new 22'' LCD screen" fld2 - "Sony Ericsson phone's batteries explode"

If these are indexed in separate fields fld1 & fld2, your query becomes

(+fld1:samsung +fld1:phone) (+fld2:samsung +fld2:phone)

Multifield query helps you to construct such queries easily so that you don't need to repeat a query for multiple fields.

Shashikant Kore
I accepted your answer but forgot to vote up. Perhaps a bit late, but just did it now :-)
Jaime Pardos
Thank you, Jaime.
Shashikant Kore