views:

35

answers:

1

I am using Solr 1.4.1 (lucene 2.9.3) on windows and am trying to understand ShingleFilter. I wrote the following code and find that if I provide more words than the actual phrase indexed in the field, then the search on that field fails i.e. no score contributed from that field with debugQuery=true.

Here is an example I created to reproduce, with field names and the document indexed:
Id: 1
title_1: Nina Simone
title_2: I put a spell on you

Issue the following Queries (dismax):
- “Nina Simone I put” <- Fails to have a score from title_1 search (using debugQuery)
- “Nina Simone” <- SUCCESS

Trying to analyze the above disparity, when I used Solr’s Field Analysis with the ‘shingle’ field (given below) and tried “Nina Simone I put”, it succeeds. So it’s only during the query that no score is provided. I also checked ‘parsedquery’ and it shows disjunctionMaxQuery issuing the string “Nina_Simone Simone_I I_put” to the title_1 field.

title_1 and title_2 fields are of type ‘shingle’, defined as:

<fieldType name="shingle" class="solr.TextField" positionIncrementGap="100" indexed="true" stored="true">
  <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="false"/>
  </analyzer>
  <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="false"/>
  </analyzer>
</fieldType>

Note that I also have a catchall field which is text. I have qf set to: 'id^2 catchall^0.8' and pf set to: 'title_1^1.5 title_2^1.2'

Is there something that I am missing or doing something wrong?

A: 

In a dismax query, the score of the query is the max of the subqueries. Not the sum. I don't really know much about how it sparse shingle queries, but if it does something like "(title1:(shingle1 shingle2...)) (title2:(shingle1 shingle2...))" then you should expect to see only one field contribute to the score.

Xodarap
Yes, dismax does take the max of the sum, if tie is 0.0. Your above point is correct, but when you set debugQuery to true, it should show what is the score from each of the query before selecting the maximimum one.Note that I have fixed the above issue using PositionFilterFactory (thanks to Steve), and trying to understand how exactly it fixed it. Any ideas?
Ethan Collins