views:

27

answers:

1

I am indexing a table of companies, where a lot of them have names starting with an integer, e.g:

2partner 3m etc.

But when I try to do a simple solr-query like "2partner" (in Solr's webinterface), the integer "2" is removed by the query parser. Here's the debug:

<lst name="debug">
<str name="rawquerystring">2partner</str>
<str name="querystring">2partner</str>
<str name="parsedquery">text:partner</str>
<str name="parsedquery_toString">text:partner</str>

How do I avoid that?

Thanks in advance :-)

/Carsten

+1  A: 

You are probably using a WordDelimiterFilterFactory with splitOnNumerics activated. Check the analyzers of the field you are storing this data into.

Pascal Dimassimo
I have indeed the WordDelimiterFilter define here:<fieldType name="textTight" class="solr.TextField" positionIncrementGap="100" > <analyzer> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>...but a) splitOnNumerics is not activated, and b) it is only defined on the fieldType "textTight" which AFAIK I am not using
Carsten Gehling
Oh bloody comment format... :-) Here's a Pastie with my entire schema.xml: http://pastie.org/1200681
Carsten Gehling
The 'text' fieldType uses a LetterTokenizerFactory. According to the doc, any non-letter characters will be discarded when using that tokenizer. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LetterTokenizerFactory
Pascal Dimassimo
Oh... Well still I have a lot to learn about Solr. :-) Which tokenizer would you recommend for a "simple" text-field?
Carsten Gehling
The StandardTokenizerFactory or WhitespaceTokenizerFactory are usually good choices. Then proceed with some filters. It all depends on your need. Check the schema.xml provided in the example folder of the latest Solr package. Check also the example here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Specifying_an_Analyzer_in_the_schema
Pascal Dimassimo
Thanks a bunch. I've changed to the StandardTokenizerFactory (and its "sibling" StandardFilter) and I am now reindexing my data. I look forward to see the result.
Carsten Gehling
Just wanted to let you know, that I have now reindexed my data during the night with the StandardTokenizer, and now everything works as expected. Thank you very much.
Carsten Gehling