ansaurus

Question

How to determine field-type for SOLR indexing?

Answer 1

+3 A:

1. Schema

Your Solr schema is very much determined by your intended search behavior. In your schema.xml file, you'll see a bunch of choices like "text" and "string". They behave differently.

<fieldtype name="string" class="solr.StrField" sortMissingLast="true"     omitNorms="true"/>

The string field type is a literal string match. It would operate like == in a SQL statement.

<fieldtype name="text_ws"   class="solr.TextField"          positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  </analyzer>
</fieldtype>

The text_ws field type does tokenization. However, a big difference in the text field is the filters for stop-words and delimiters and lower-casing. Notice how these filters are designated for both the Lucene index and the Solr query. So when searching a text field, it will adapt the query terms using these filters to help find a match.

<fieldtype name="text"      class="solr.TextField"  positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter ..... />
    <filter ..... />
    <filter ..... />
</fieldtype>

When indexing things like news stories, for example, you probably want to search for company names and headlines differently.

<field name="headline" type="text" />
<field name="coname" type="string" indexed="true" multiValued="false" omitNorms="true" />

The above example would allow you to do a search like &coname:Intel&headline:processor+specifications and retrieve matches hitting exactly Intel stories.

If you wanted to search a range

2. Result Fields

You can defined a standard set of return fields in your RequestHandler

<requestHandler name="mumble" class="solr.DisMaxRequestHandler" >
    <str name="fl">
        category,coname,headline
    </str>
</requestHandler>

You may also define the desired fields in your query string, using the fl parameter.:

/select?indent=on&version=2.2&q=coname%3AIn*&start=0&rows=10&fl=coname%2Cid&qt=standard

You can also select ranges in your query terms using the field:[x TO *] syntax. If you wanted to select certain ads by their date , you might build a query with

ad_date:[20100101 TO 20100201]

in your query terms. (There are many ways to search ranges, I'm presenting a method that uses integers instead of Date class.)

memnoch_proxy 2010-01-22 18:21:30

Do you know where I can find a "reference manual" of all classes and attributes for these field-types?

Camran 2010-01-25 09:50:33

I typically start on the Solr wiki http://wiki.apache.org/solr/ and the Javadocs for the classes are located here: http://lucene.apache.org/solr/api/index.html.

memnoch_proxy 2010-01-25 17:30:33

ansaurus

tags:

views:

answers:

How to determine field-type for SOLR indexing?

related questions