views:

117

answers:

2

Hi,

I'm having an issue querying Solr using the following field type:

<fieldType name="text_ci" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
       <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
   </analyzer>
</fieldType>

As you can see it applies the "SnowballPorterFilterFactory" when indexing and querying. If I Index something like

Mouse stuff and fun

It get's indexed as:

Index Breakdown in Solr

As you can see the word "Mouse" is turned into "Mous" by the "SnowballPorterFilterFactory". Which is what we want. However when we search for

Mouse*

It doesn't seem to apply the "SnowballPorterFilterFactory" in the same way. I guess due to the * at the end.

Query Breakdown in Solr

My question is.. Is there a way to make the "SnowballPorterFilterFactory" know about wildcards? So that when i Query for

Mouse*

I don't get 0 results.

Interestingly if i query for

mous*

The record does come back.

Or can someone offer a better way to query/index this type of field?

Thanks Dave

+1  A: 

Last time I check, when you use wildcards, the query analyzer is not used. So since you are using a LowerCaseFilterFactory, your terms are indexed in lower case and searching for Mous* won't return anything.

I think the only thing to do when you are using wildcards is to make sure to adapt your query to the way your terms are indexed (in a way similar to what your query analyzer would do).

Pascal Dimassimo
Dang.. you are right about the Mous* part.. let me update the question
CraftyFella
Is the 2nd paragraph the only way to handle wildcards in solr?
CraftyFella
+1  A: 

From the FAQ:

Unlike other types of Lucene queries, Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer, which is the component that performs operations such as stemming and lowercasing. The reason for skipping the Analyzer is that if you were searching for "dogs*" you would not want "dogs" first stemmed to "dog", since that would then match "dog*", which is not the intended query. These queries are case-insensitive anyway because QueryParser makes them lowercase. This behavior can be changed using the setLowercaseExpandedTerms(boolean) method

If you're fine with changing your Solr source, SOLR-757 has a patch attached to it which you might find useful. I don't know of a way to change this other than diving into the source though.

What might be a simpler idea: just have a copy field which is not stemmed. The user can search both of these fields, and then mouse* will match in the non-stemmed field.

(EDIT: actually, looking at that patch, I'm not sure it will do what you want. But basically you just need to change your query handler to stem first.)

Xodarap
Thanks... That answers my query about why it doesn't apply the filters. I like the idea of the copy field.. thanks
CraftyFella