tags:

views:

41

answers:

3

Hello, I am total beginner with Solr and have a problem with unwanted characters getting into query results. For example when I search for "foo bar" I got content with "'foo' bar" etc. I just want to have exact matches. As far as I know this can be set up in schema.xml file. My content field type:

<fieldtype name="textNoStem" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
   <filter class="solr.LowerCaseFilterFactory"/>
   <tokenizer class="solr.KeywordTokenizerFactory"/>
 </analyzer>
 <analyzer type="query">
   <tokenizer class="solr.StandardTokenizerFactory"/>
   <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldtype>

Please let me know if you know the solution. Kind Regards.

A: 

If you just want exact matches use the KeywordTokenizerFactory instead of the StandardTokenizerFactory at query time.

Raoul Duke
Thank you for quick answer. However while using KeywordTokenizerFactory I don't get any results at all with queries like "foo bar". I tried adding <filter class="solr.StandardFilterFactory"/> to query analyzer but still no changes. I'm running out of ideas..
Daniel
A: 

For both analyzers, the first line should be the tokenizer. The tokenizer is used to split the text into smaller units (words, most of the time). For your need, the WhitespaceTokenizerFactory is probably the right choice.

If you want absolute exact match, you do not need any filter after the tokenizer. But if you do no want searches to be case sensitive, you need to add a LowerCaseFilterFactory.

Notice that you have two analyzers: one of type 'index' and the other of type 'query'. As the names implied, the first one is used when indexing content while the other is used when you do queries. A rule that is almost always good is to have the same set of tokenizers/filters for both analyzers.

Pascal Dimassimo
thank you, that helped a lot!
Daniel
A: 

Hi, I guess you dont get any results because the tokening is done differently on the data that is already indexed. As Pascal said, whitespaceTokenizer is the right choice in your case. Use it at both index and query time and check the results after indexing some data, not on the previously indexed data.

I suggest using analysis page to see the results with out actually indexing.Its quite useful.Make changes in schema, refresh the core, go to analysis page and look at verbose output to get the step by step analysis.

kaka