tags:

views:

48

answers:

2

Here is the interresting part of the schema :

    <fieldType name="text_rev" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
    <filter class="solr.ASCIIFoldingFilterFactory" />
    <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true" maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
    <filter class="solr.ASCIIFoldingFilterFactory" />
  </analyzer>
</fieldType>

I got a field named Title using this fieldType and some products whose title contains "Harry Potter"

The query : "Title:Harry*" will return 0 results whereas "Title:Harry" will return a lot. But the thing is "Title:Potter*" will return the same number of results as "Title:Potter"

So why is "Title:Harry*" not returning any result?

Edit : I found a workaround using the query : 'Title:"Harry*"' (Notice the use of " )

Here is the debug for query Title:Harry , Title:Harry* , Title:"Harry*"

Title:Harry

  <str name="rawquerystring">Title:Harry</str> 
  <str name="querystring">Title:Harry</str> 
  <str name="parsedquery">Title:harri</str> 
  <str name="parsedquery_toString">Title:harri</str> 

Title:Harry*

  <str name="rawquerystring">Title:Harry*</str> 
  <str name="querystring">Title:Harry*</str> 
  <str name="parsedquery">Title:Harry*</str> 
  <str name="parsedquery_toString">Title:Harry*</str> 

Title:"Harry*"

  <str name="rawquerystring">Title:"Harry*"</str> 
  <str name="querystring">Title:"Harry*"</str> 
  <str name="parsedquery">Title:harri</str> 
  <str name="parsedquery_toString">Title:harri</str> 
A: 

When we query for "Title:Harry*", its actually a phrase search for default search field.

This is how it gets processed and gets assigned to a default search field which is Text in my case. "userName:harry*" "userName:harry*" PhraseQuery(statusText:"username harry") Text:"username harry"

"Title:Potter*" will return the same number of results as "Title:Potter" This sounds very wierd, My guess will be 0 for both

I suggest to use the parameter debugQuery=on to see how exactly the query is parsed. "Title:Harry" is returning results because there must be text containing that phrase. Hope this helps

kaka
I've added in the question the debug info for the queries.
Jean-Philippe Gire
A: 

The cause of this is the mix of LowerCaseFilterFactory and wildcard query. When indexing, this (of course) lowercases all letters in your terms -- letting both 'Harry' and 'harry' match.

When you do a wildcard query, like "Harry*", no analysis is done on the query terms -- i.e. it is not lowercased. You could possibly circumvent your problem by lowercasing your query client-side, as long as you don't have any requirements that dictate case sensitivity.

Karl Johansson
I've tried with Title:harry* it returned some results but not what i was looking for. It returned only title like "Harry/Nilsson" or "Ray Harryhausen set (5DVD)" title where Harry is not a full word.
Jean-Philippe Gire