tags:

views:

189

answers:

1

I use Solr for searching in my data and I recognized now that some of the solr search query language feature does not word for me. I miss these from the capabilities I have:

  • fuzzy search
  • wildchards * ? - I do not have stemming set up so far, this would be useful temporarily for searching
  • field specification - currently I cannot tell search in title:Blabla

As far as I know these things should come by default in Solr, but I obviously don't have them. I use Solr 1.4. Here you can find my schema. Thanks for your help.

+1  A: 

Your fieldType name="text" is missing a lot of filters. For reference, here's the text fieldType from the default schema.xml:

<!-- A text field that uses WordDelimiterFilter to enable splitting and matching of
    words on case-change, alpha numeric boundaries, and non-alphanumeric chars,
    so that a query of "wifi" or "wi fi" could match a document containing "Wi-Fi".
    Synonyms and stopwords are customized by external files, and stemming is enabled.
    -->
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal.
      add enablePositionIncrements=true in both the index and query
      analyzers to leave a 'gap' for more accurate phrase queries.
    -->
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
  </analyzer>
</fieldType>

For example, the SnowballPorterFilterFactory is the one that enables stemming.

I recommend building your schema based on the default schema.xml, tweaking and modifying as necessary (as opposed to starting from scratch).

Here's the reference for analyzers, tokenizers and filters.

Mauricio Scheffer
Thanks Mauricio. I use instead of whitespacetokenizer lettertokenizer. Whitespacetokenizer does forget the punctuation characters at the end of the word. All the other things, you listed are fine, and I will use it, but I prefered to begin with a stripped down set. For instance I cannot use now the snowball stemmer, as it is not done yet for my language. Doesn't the query parsing has to something with the SolrQueryParser? http://lucene.apache.org/solr/api/org/apache/solr/search/SolrQueryParser.html Does it?
fifigyuri
It looks like hungarian stemming can be bought: http://www.lucidimagination.com/search/document/CDRG_ch05_5.6.16. Also why do you ask about SolrQueryParser? Are you looking to extend Solr? Normally you don't need to change code in Solr as it highly extensible and configurable.
Mauricio Scheffer