views:

33

answers:

2

Hello all,

I have read a couple of tutorials and browsed the Solr documentation. But one thing isn't clear to me. Let me explain:

Let's asume that the following document shall be indexed:

<doc>
  <field name="id">R12345</field>
  <field name="title">My title</field>
  <field name="content">My Content</field>
</doc>

Contrary to this document, the index should contain one extra field called "docType". This extra index field should be filled using a "completion rule". The idea behind this:

If id starts with character "R" then write the String "Resolve" into field docType in the index. If id starts with character "C" then write the String "Contribute" into field docType in the index.

The above document should be available in the index with the following fields:

id=R12345
title=My Title
content=My Content
docType=Resolve

My idea is to use an Analyzer for this. The result of the Analyzer will then be written into field "id" in the index as usual (only a copy of the original text) but the result "Resolve" or "Contribute" should be written in another field.

My basic question is: How can this be achieved in teh Analyzer (Java snipped)? To make it more complex the index field "docType" should be searchable and must be available in the search result. How will the schema look like for field id and docType?

Thanks in advance Tobias

+2  A: 

If you only need the indexed value, then the schema approach is sufficient. Create a new fieldtype that performs necessary processing, create a field of your new type, and set up a copy field to copy the value from id:

<fieldType name="doctypeField" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.PatternReplaceFilterFactory" pattern="([CR]).*" replacement="$1" replace="all" />
    <filter class="solr.PatternReplaceFilterFactory" pattern="C" replacement="Contribute" replace="all" />
    <filter class="solr.PatternReplaceFilterFactory" pattern="R" replacement="Resolve" replace="all" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

<field name="doctype" type="doctypeField" indexed="true" stored="false" required="false" />

<copyField source="id" dest="doctype"/>

You might want to note that you won't get a stored value from this. If you need that, then you should have the docType value figured out before feeding the document to Solr -- for instance by creating it in the SQL-query, if your content source is SQL, etc.

Karl Johansson
Thank you for the fast answer. If I understand you correctly, <copyField /> does not copy directly to the index, but copies the content of the source to the destination field before the Analyzer runs and the Analyzer result will then be saved in the index? Great idea, great answer, thank you very much.Is it also possible to write values inside an Analyzer into custom index fields using Java source code?
Tobias Stening
You understand it right; field copy is done first, then analysis. If you want to create custom analyzers, see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers
Karl Johansson
A: 

Thank you very much! :-)

Tobias Stening