ansaurus

Question

Answer 1

+2 A:

First of all, try to understand why your words don't get indexed by solr using the "Analysis Tool"

http://localhost:8080/solr/admin/analysis.jsp

Just put the field and the text you are searching for and see which analyser is filtering your short term. I suggest you to do so because you said you have only a "suspect" and you have to be certain about which analyser filters your data.

Then why don't you just simply copy the term in another field without that analyser?

In this way your terms will be indexed twice, and will appear both as exact word and as n-gram. Then you have to deal with the scores of the two different fields.

I hope this has helped you in some way.

Some link for aggregation and copyfield attribute:

Indexing data in multiple fields

Using copy field tag

volothamp 2010-06-11 08:24:47

Thanks for your suggestion. I have run the analysis against two words: A normal case - "jeudan" and the 1-letter word "j". Here are the results http://pastie.org/1000520As you can see, it IS actually the NGramTokenizer that is filtering out the 1-letter word - or in this the EdgeNGramTokenizer, but I have tested with both.I could try what you suggest, but I would rather, let Solr do all the text-munging. I do a lot of field-specific searches, so your suggestion would result in the need to rewrite those queries to look in two text-fields instead of one. Possible but counter-intuitive.

Carsten Gehling 2010-06-11 09:05:53

Consider that it's typical in solr to have an aggregation field where you make the query, and then a series of fields with different types and analyser. Simply use the copyfield tag to copy all your source field to the target. You don't have to change your queries.

volothamp 2010-06-11 09:41:08

Well your answer actually solved this and other problems, that I faced. I didn't know about the analysis tool. I ended up trying a few other filters and tokenizers through the analyser, and ended up using the PhoneticFilter on both the index and query part. Very neat - thanks a lot!

Carsten Gehling 2010-06-14 04:26:17

ansaurus

tags:

views:

answers:

Search for short words with SOLR

related questions