tags:

views:

853

answers:

2

I have a Lucene index which is currently case sensitive. I want to add the option of having a case insensitive search as a fall-back. This means that results that match the case will get more weight and will appear first. For example, if the number of results is limited to 10, and there are 10 matches which match my case, this is enough. If I only found 7 results, I can add 3 more results from the case-insensitive search.

My case is actually more complex, since I have items with different weights. Ideally, having a match with "wrong" case will add some weight. Needless to say, I do not want duplicate results.

One possible approach is to have 2 indexes. One with case and one without and search both. Naturally, there's some redundancy here, since I need to index twice.

Is there a better solution? Ideas?

+2  A: 

Did you already tried copyField? see http://wiki.apache.org/solr/SchemaXml#Copy_Fields

If not define a new field B with a different configuration and copy field A into B via copyField

Karussell
Well, copyField is a Solr feature and I'm using bare-bones Lucene. Yet, I can just add an extra field with the same indexed text, in lower case. This is far better than creating a completely separate index, so +1.
zvikico
ups, ok. I had exactly the same problem, but was working with solr. I added this answer a bit too fast though.
Karussell
I'm already up and running with the extra field, so your answer gave me a nudge in the right direction. That's all I needed. Thanks again. I will keep it open to see if I can get more efficient solutions.
zvikico
Will mark as the right answer. Again, not exactly the solution, but a nudge in the right direction.
zvikico
A: 

The Lucene search is case sensitive, it's just that all input is usually lower-cased upon passing through Queryparser , so it feels like it's case insensitive. In other words, don't lower-case your input before indexing, and don't lower-case your queries (i.e. pick an Analyzer that doesn't lower-case) keyword-analyzer for example.

[setLowercaseExpandedTerms][1](boolean lowercaseExpandedTerms)

you can index the terms using case sensitive analyzer and when u want case-insensitive query use a class which doesnot convert your terms to lowercase

look at Wildcard, Prefix, and Fuzzy queries

Narayan
Naturally, using a case-sensitive analyzer with a lower-care query will not yield the correct results.
zvikico